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ABSTRACT 


We have developed a combined statistical-dynamical prediction scheme 
to predict the probability of tropical cyclone (TC) formation at daily, 2.5° 
horizontal resolution across the western North Pacific at intraseasonal lead 
times. Through examination of previous research and our own analysis, we 
chose five variables to represent the favorability of the climate system to support 
tropical cyclogenesis. These so-called large-scale environmental factors 
(LSEFs) include: low-level relative vorticity, sea surface temperature, vertical 
wind shear, Coriolis, and upper-level divergence. Logistic regression was 
employed to generate a statistical model representing the probability of TC 
formation at every grid point based on these LSEFs. Thorough verification of 
zero-lead hindcasts reveals this model displays skill and potential value for risk 
adverse customers. In particular, these hindcasts had a positive Brier skill score 
of 0.03 and a skillful relative operating characteristic skill score of 0.68. The fully 
coupled, one-tier NCEP Climate Forecast System was used as the dynamical 
model with which to forecast the LSEFs and, in turn, force the regression model. 
A series of individual TC case studies were conducted to demonstrate the 
predictive potential, at intraseasonal leads, of our statistical-dynamical method. 
Lastly, we investigated the applicability of intraseasonal forecasts to military 
planning. 
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I. INTRODUCTION 


A. MOTIVATION 

Three months before the kickoff of the Valiant Shield (VS) naval exercise, 
a group of U. S. Navy planners gathers in a small conference room at Pearl 
Harbor to compare notes. The meeting scrubs the logistics and rules of 
engagement for this large scale, joint forces event held in the tropical western 
North Pacific region near Guam. Hours later, an environment-savvy planner 
questions, “is the weather going to cooperate?” He continues, “How might 
tropical cyclones affect the ability of the different platforms to operate in the 
designated exercise area and period?” 

This meeting may be hypothetical, but those questions are exactly the 
type that military planners should be asking and that Department of Defense 
(DoD) weather centers should be capable of answering with confidence. 
Unfortunately, no suitable products currently exist to answer such questions. 
Such mission planning well in advance of the operation(s) is not unusual in the 
DoD. Though this example depicts a complex exercise, the same environmental 
intelligence should be exploited for a multitude of missions, such as planning 
flight qualification training at long leads or establishing a CORONET trans¬ 
oceanic air bridge. 

A gap clearly exists in DoD weather support for forecasts with lead times 
on the order of weeks to months. Consider the potential benefit—in dollars, 
hours, morale, etc.—if weather forecasters were able to provide those planners 
with a regional outlook for tropical cyclone activity and an idea of avenues of safe 
passage through the western North Pacific. This thesis will investigate one such 
approach that would benefit this scenario. 
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B. CLIMATE PREDICTION PROCESS 


1. Syntax and Definitions 

Below are definitions for and discussions of some key terms that are used 
in this paper. 


a. Climatology 

While climatology literally refers to the description and scientific 
study of climate (Glickman 2000), this term is used in this work to refer to a 
quantitative description of an element in terms of a long term average; for 
example, the frequency of cyclone formation for a given grid box in a region for a 
given unit of time. Climatology is also used throughout this thesis as the baseline 
forecast against which our methods will be compared. Appendix A includes a 
more in-depth discussion on the variations in the methods used to calculate 
climatologies. 


b. Intraseasonal 

Used in reference to a subset of forecast products and associated 
lead times, intraseasonal comprises a period bounded by a single season or 
other three month period. Often referred to as long-range forecasting, 
subseasonal forecasting, or short-term climate prediction, the lead times for 
intraseasonal products and techniques are typically longer than two weeks, but 
less than three months. 

c. Prediction 

The word prediction is readily used interchangeably with a form of 
the word forecast. Though such use may be grammatically correct, we use the 
word prediction to denote a quantitative scientific estimate of future climate 
conditions that has skill. A forecast, in contrast, is used to refer to both the 
prediction process, regardless of perceived skill, and the deliverable that results. 
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The difference here between forecast and prediction may be more 
psychological than meteorological. The customers for forecasts and predictions 
(e.g., military operators, the general public, etc.) expect that weather forecasts 
are readily available (e.g., a five-day forecast from the local news media). In 
contrast, a description of the future state of the climate system may be best 
thought of as a prediction that is issued only if the prediction has some perceived 
skill beyond a baseline forecasts (e.g., a forecast of climatological conditions). In 
that context, a customer should not always expect a prediction that varies from 
the long-term mean (LTM) for temperature over the forthcoming summer in the 
same way he expects a local forecast for tomorrow’s high temperature. 

d. Tropical Cyclone 

In the most general form, tropical cyclone refers to a closed, 
cyclonic circulation with its origins over a tropical ocean basin. Tropical cyclones 
(TCs) are classified according to their intensity, and these classifications vary 
somewhat by ocean basin. In the western North Pacific (WNP), a tropical 
depression is characterized by winds up to 17 ms \ a tropical storm has winds of 
18 ms‘^ to 32 ms‘\ a typhoon has winds 33 ms'^ to 66 ms'\ and a super typhoon 
has winds that exceed 66 ms'\ 

2. Elements of Operational Climate Prediction 

The basis of this thesis is the exploratory development of a state-of-the- 
science climate prediction system for likely TC formation areas in a given 
geographical region. Though the idea of climate prediction—intraseasonal or 
otherwise—is not new, no available resources clearly outline the prediction 
process. 
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Figure 1. Schematic of the climate prediction process. 


Figure 1 provides a schematic description of a state-of-the-science 
approach to developing an operational climate predictions. As presented, this 
process is generic and may be applied to various meteorological or 
oceanographic elements and over various time scales. The flow of the arrows in 
the diagram indicates that the process is fluid, and often iterative in nature. 
Though the process may vary somewhat in specific cases, the depicted steps are 
all important to the development of an operational deliverable. 

3. Methods of Prediction 

Though the Forecast Method Development step is only one step in the 
process depicted in Figure 1, the development of the forecast method is likely the 
most challenging component of the climate prediction process. Three primary 
categories of predictive methods exist in operational intraseasonal/seasonal 
forecasting: statistical, dynamical, and a combined statistical-dynamical 

approach. 
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a. Statistical 

Whether the approach is projecting average conditions, 
constructing analogues, or applying empirical orthogonal functions, statistical 
methods are widely used in climate prediction. These and many other statistical, 
also referred to as empirical, methods use existing data sets in order to develop 
predictive methods based on past conditions. Such methods are mainstays for 
intraseasonal and seasonal climate prediction at the National Weather Service’s 
Climate Prediction Center (CPC) and other climate prediction centers (van den 
Dool 2007). 


b. Dynamical 

Numerical weather prediction may be the standard for day-to-day 
weather forecasting, but dynamical methods in intraseasonal and seasonal 
climate prediction are often less skillful than comparable statistical methods. Van 
den Dool (2007) cites that in 2005, the National Centers for Environmental 
Prediction (NCEP) presented an award to a group of its employees for 
developing the Climate Forecast System (CFS; to be discussed in Section II.B.4) 
that led to “the first time in history (in which) numerical seasonal predictions were 
on par with empirical methods.” 

The CFS belongs to a class of numerical models known as general 
circulation models (GCMs). Many GCMs were built to focus on global climate 
issues; therefore, they struggle when applied regionally at intraseasonal time 
scales. Coarse resolution, limited parameterizations, and systematic model 
errors all translate into limited operational use of many of the GCMs. However, 
one advantage of a GCM vice a purely statistical method is the ability of the 
numerical model to explicitly account for nonlinear processes (van den Dool 
2007). The reader is directed to van den Dool (2007) for an informative 
discussion of the relative performance of GCMs compared to empirical methods. 
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c. Combined 

The wide use of statistical techniques in short-term climate 
prediction leads one to the question whether there is any benefit to using a GCM 
or combined statistical-dynamical approach, or whether a pure statistical forecast 
would perform just as well. A combined methodology is potentially superior, 
since such an approach has the ability to incorporate the advantages of each 
approach. The method used in this thesis entails the use of a GCM to develop a 
prediction of the large sale environmental factors (LSEFs) that affect TC 
formations, and then uses these LSEFs to force a statistical model that has been 
trained over many years of TC and LSEF data to predict the probability of TC 
formation based on the GCM predictions of the LSEFs. 

C. EXISTING PRODUCTS 

1. Seasonal 

Seasonal predictions of TC activity forecast the overall character for an 
entire TC season within an entire basin (e.g., the total number of TCs in a WNP 
TC season). The lead times for seasonal predictions are approximately zero to 
six months. Among the earliest seasonal tropical cyclone predictions were those 
produced at Colorado State University in the 1980s for the Atlantic basin. 
Prediction techniques have continued to develop and expand since these early 
forecasts, and now include other ocean basins (Camargo 2006). Though 
seasonal prediction is not the focus of this thesis, these existing products are 
briefly mentioned here as they provide much of the framework from which the 
newer intraseasonal products are derived. Seasonal TC forecast products for 
the WNP are generated at various centers using statistical and dynamical 
methods. The following listing of centers and products is by no means all- 
inclusive, but provides a glimpse into the spectrum of participants and 
approaches. 
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a. 


Statistical 


The City University of Hong Kong has issued seasonal forecasts for 
the number of storms in the WNP basin since 1997. They use several 
environmental conditions, the most prominent of which are El Nino and the 
Pacific subtropical ridge, in order to forecast the number of TCs (Chan et al. 
2001). Tropical Storm Risk, a consortium out of the United Kingdom, also issues 
statistical forecasts for TC activity in the WNP. In addition, they generate a 
forecast of the NW Pacific accumulated cyclone energy (ACE) index, based in 
large part on conditions in the Nino 3.75 region (Lloyd-Hughes etal. 2004). 

b. Dynamical 

The European Centre for Medium-range Weather Forecasts 
(ECMWF) issues seasonal forecasts of TC activity based on coupled ocean- 
atmosphere models (Vitart and Stockdale 2001). The International Research 
Institute for Climate and Society (IRI) also generates seasonal forecasts of TC 
frequency, but uses a “two-tier” approach. The first step, or tier, entails 
employing various statistical and dynamical models to forecast future sea surface 
temperature (SST) conditions. Then, the predicted SSTs are used to force 
numerical atmospheric models. Detection algorithms are then used to identify 
TC-like features from amidst the coarse-resolution output fields (Camargo and 
Zebiak 2002). 

2. Intraseasonal 

Intraseasonal predictions of TC activity forecast the activity for 
intraseasonal periods (e.g., two weeks to two months) within an ocean basin. TC 
prediction at intraseasonal time scales is a comparatively new area of research, 
which may be attributed to increasing model resolution, improving ensemble 
techniques, and continuing research into intraseasonal climate oscillations. 
Many of the centers noted in the seasonal section are active in the intraseasonal 
realm as well. 
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a. 


Non-DoD Products 


On the intraseasonal time scale, the Madden-Julian oscillation 
(MJO) presents the greatest predictive potential for empirical approaches. Useful 
predictive skill for statistical methods are on the order of 15 to 20 days, limited by 
the signal-to-noise ratio of the MJO (Camargo 2006). Frank and Roundy (2006) 
look beyond MJO alone and generate daily probabilities of formation using a 
wide variety of wave modes and climate signals. More recently, Leroy and 
Wheeler (2008) used logistic regression in a purely statistical prediction scheme 
to predict the probability of TO formation in fixed zones of the Southern 
Hemisphere. Their predictors include one representing a smoothed 
climatological cycle, two representing the propagation of MJO, and two 
representing the leading patterns of SST variability. 

Despite the promise of the budding statistical methods, Camargo 
(2006) contends that “while there is much room for improvement in the skill and 
application of empirical/statistical methods of intra-seasonal TC prediction, the 
greatest hope for improvement lies with dynamical/numerical models.” One of 
the most promising players in the dynamical field is the ECMWF and their 
Ensemble Prediction System (EPS). 

Only a few centers create subseasonal forecasts, even fewer do so 
operationally and make the forecasts available freely online. The CPC is among 
this select group, with its operational Global Tropics Benefits/Hazards 
Assessment product. 
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Figure 2. Example CPC Global Tropics Benefits/Hazards Assessment, 
issued by CPC/NCEP on 6 August 2007 and valid 14-20 August 2007 
(From http://www.cpc.noaa.gov/products/precip/CWIink/ghazards/; 
accessed 12 January 2009). 


Figure 2 is an example of the Global Tropics Benefits/Flazards 
Assessment issued by CPC. This product has both the graphical depiction (as 
shown in Figure 2), as well as accompanying text that explains the assessment. 
The description for the highlighted area in the WNP labeled region “4” states 
(From http://www.cpc.noaa.gov/products/precip/CWIink/ghazards/; accessed 12 
January 2009): 

The potential for tropical cyclone development northeast of the 
Philippines and in the South China Sea. Active convection is 
expected in this area and with areas of anticipated weak vertical 
wind shear and above average SSTs the prospects for tropical 
cyclogenesis are increased. Confidence: Moderate. 

Though this product makes strides with providing outlooks for 
impacts on TC activity due to the forecasted state of the tropical climate system, 
this product is limited by its subjective combination of forecast tools. Plans for 
this product include making it more objective in nature (Gottschaick et al. 2008). 
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b. 


DoD Products 


As of this writing, no DoD centers are actively issuing forecasts at 
seasonal or intraseasonal leads for the tropics. The Joint Typhoon Warning 
Center (JTWC) is the DoD agency responsible for issuing tropical cyclone 
warnings for the Indian and Pacific Oceans. Products produced by JTWC are 
intended for use in decision making by operational military units, though most of 
these products are nowcasts and/or short-term forecasts. 

The Fleet Numerical Meteorological and Oceanographic 
Detachment - Asheville (FNMOD) is another logical place for operators/planners 
to turn for information regarding future tropical activity. FNMOD does maintain a 
Mariners’ Worldwide Climate Guide to Tropical Storms at Sea, which appears to 
be a form of climatology for each basin broken down into 10-15 day periods 
(depending on the time of year). This guide is certainly better than having 
nothing at all, but contains no information about the current or forecasted state of 
the climate system. 

Collocated with FNMOD, is the Air Force’s 14th Weather Squadron 
(14WS; formerly known as the Air Force Combat Climatology Center (AFCCC)). 
While the 14WS recently began issuing long-range forecasts for select locations 
(i.e., Iraq, Afghanistan), no products concerning the current or forecasted state of 
the tropical climate system in general, or TCs in particular, are available. 

D. RESEARCH MOTIVATION AND SCOPE 
1. Prior Work 

The idea for taking a combined statistical-dynamical approach for 
predicting likely cyclogenetic regions in the tropics evolved from thesis work by 
Meyer (2007). Though his study did not venture into the realm of forecasting, 
Meyer used logistic regression to calculate the probability of TC formation in 
weekly five-degree latitude by five-degree longitude grid blocks as a way of 
quantifying the impacts of changes in the large-scale environment on the 
likelihood of TC formation. 
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TC Formation Probability for 2006 Week 26 Thresholds not considered 



Figure 3. Example figure generated using methods from Meyer (2007), 
contoured are the zero-lead hindcast probabilities of TC formation for 

week 26 of 2006, and the contours are at 0.01,0.25, 0.40, and 0.55. The 
red dot represents a verifying TC formation location. 

Figure 3 is an example plot after Meyer’s work. Such plots resulted in a 
perceived forecast potential and a methodological basis for this thesis work. 

2. Research Questions 

This thesis is an exploration into the viability of the prescribed 
methodology as a predictive tool at intraseasonal time scales. While many sub¬ 
questions exist, this work will primarily focus on investigating the following two 
questions: 

1) Can favorable regions for tropical cyclogenesis be predicted at 
intraseasonal lead times, by way of forcing a statistical model with available 
output from a GCM? 

2) Does a combined statistical-dynamical approach appear to result in 
skill and value beyond that which basic climatology provides? 
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As one can deduce from the preceding questions, this work will 
concentrate on methodology and stand as a “proof of concept.” This thesis is not 
an attempt to advance the science of tropical dynamics; however, it may 
indirectly contribute to an improved understanding of, and ability to model, the 
large scale environmental factors that affect TC activity. 

3. Thesis Organization 

In order to answer the two aforementioned research questions, this work 
will focus on the following steps of the climate prediction process (see Figure 1): 
Data Selection, Climate System Analysis, Forecast Model Development, 
Hindcast/Forecast, and Verification/Evaluation. 

Chapter II begins by defining the study region and time period, and then 
provides a brief look at the numerous data sets used in this study, as well as the 
methods used in developing and testing our predictive model. Also included in 
Chapter II is a summary of the large-scale variables thought to impact TC 
formation. Chapter III outlines the results of the model development and 
hindcasting; in addition. Chapter III demonstrates the predictive potential of the 
model by way of a pair of case studies. Chapter IV provides a summary of our 
results and conclusions, and offers suggestions for future research. 

To make this thesis purposefully concise, several topics are only 
mentioned briefly in the text but covered more at length in the appendices. The 
following topics are appended to this work as references for the reader: 
climatology development and selection (Appendix A), calculation of variables 
from available output fields (Appendix B), and plots from additional case studies 
(Appendix C). 
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II. DATA AND METHODS 


A. STUDY REGION 

The western North Pacific (WNP) was chosen as the focus region for this 
study. Our analyses of JTWC best track TC data from 1970-2007 indicate that 
an average of 30 TCs—tropical depressions through super typhoons—form per 
year, with a standard deviation of 4.8 storms. With that, the WNP has the 
highest average number of TCs annually of all basins, and accounts for nearly 
30% of global annual total TCs (Chan 2004). The WNP is also the only ocean 
basin wherein TC formation is observed throughout the year, although the 
majority of cyclones develop between June and November (Frank 1987). This 
study region was also chosen for its economic and military importance. 


TC Formation Points: 1970-2007 
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Figure 4. Depiction of the WNP study region (outlined by the blue box) and 
TC formation points (red dots), constructed from JTWC WNP best track 

data from years 1970-2007. 

The study region extends from 100°E to 190°E (170°W) and from the 
Equator to 30°N, as depicted in Figure 4. No literature standard exists for 
defining the WNP basin; however, the bounds for our study region differ no more 
than 10° in any one direction from the majority of other sources. Cne reason that 
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our bounds differ slightly from other studies that deal with the WNP, is that in 
focusing on the genesis locations there is no need to allow for the recurvature of 
TCs post-formation. Defining restrictive bounds also minimize the potential 
impacts of data dilution in our statistical verification. 

Day of TC Formation 
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Figure 5. Number of TC formations versus day of year, constructed with 
JTWC WNP best track data from years 1970-2007. 

As noted earlier, TC formations are observed throughout the year in the 
WNP. Figure 5 displays the variation in the number of TC formations in the WNP 
during the period for a given Julian day. This figure also highlights the unequal 
distribution of formations over the course of the year. If one defines the peak 
formation period as June through November (as in Frank 1987), those months 
account for 936 of the 1122, or 83%, of the TCs; in contrast, a peak formation 
period of July through October (as in Sobel and Camargo 2005) accounts for 761 
of the 1122, or 68%, of TC formations. Flereafter in this study the peak formation 
season will be defined as a period encompassing the months June through 
November. 
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Just as the temporal distribution of TC formations is not uniform 
throughout the year, the distribution varies spatially over the extent of the study 
region. Figure 6 highlights the spatial variability from grid point to grid point. 
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Raw 2.5 Degree Formation Climatology; 1970-2007 





Figure 6. Contoured probability of TC formation, constructed from binned 
JTWC best track data from the years 1970 - 2007. Values represent the 
probability that a TC will form per year in a given grid box. 

Figure 6 shows what we term the “Raw 2.5 Degree Formation 
Climatology” that was constructed by summing the number of TC formations in 
the JTWC best track data for 1970-2007 within 2.5° latitude/longitude grid boxes, 
and then dividing the total per box by the number of years. The result is a map of 
the climatological, or long term mean, probability of TC formation during January- 
December. The probabilities are based on TC formation over the course of the 
entire year, so the contour values are not overly useful to most decision makers. 
See Appendix A for further discussion on TC formation climatology. 

In this study, we attempted to develop and test a statistical-dynamical 
method for forecasting TC formation probabilities that is more skillful than the 
forecasts that could be obtained by simply using climatological TC formation 
probabilities (e.g., those shown in Figure 6 and discussed in Appendix A). For 
such a method to be more skillful than climatology, it needs to be skillful at 
representing climate anomalies in the large-scale environment that affect TC 
formations. Thus, it is useful to review the general conditions that influence TC 
formations in the WNP during the peak formation season. Figure 7 shows the 
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main low-level circulation features that characterize summer Conditions in the 
WNP. Note in this figure the band of convergent and cyclonic flow marked by the 
dashed line. This band is often labeled the monsoon trough (Ramage 1995), so 
named because of its association with summertime monsoonal flow in the region. 



Figure 7. Schematic depiction of summertime low-level flow over the WNP. 

The dashed line marks the monsoon trough and the zig-zag line indicates 
the mean ridge axis (From; Figure 1 (a) Lander 1996). 

The monsoon trough is associated with the development of most TCs in 
the WNP (Xue and Neuman 1984), due to the predominantly favorable 
environmental factors (as described in Section II.C.). This is also indicated by 
the co-location of the high probabilities in Figure 6 and the climatological position 
of the monsoon trough in Figure 7. The position of the monsoon trough 
experiences normal seasonal variations through the year, as well as spatial and 
temporal deviations from its normal seasonal cycle. One example of a significant 
deviation is labeled a reverse-oriented monsoon trough, when the convergence 
zone extends from southwest (SW) to northeast (NE) over the WNP (Chu 2004). 

B. DATA SETS AND SOURCES 

The structure, format, and availability of the primary data sets used in this 
thesis are described in this section. The inquisitive reader is directed to the cited 
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references for more information on each of these data sources. All of the data 
used in this thesis are freely and publically available online. 

1. JTWC Best Track 

The JTWC maintains an archive of tropical cyclone data for the WNP. At 
a minimum, these “best track” files contain the latitude and longitude of the TC 
center location at six-hour intervals. These data are used for both model 
construction and verification in this study. 

The best track archive includes all TCs identified by the JTWC, and even 
includes a number of storms that were determined to be of sufficient strength for 
classification as TCs well after the storms occurred. The aforementioned TC 
numbers in Section II.A. are higher than those in some prior studies that 
analyzed only storms that were of tropical storm intensity or greater. 

The JTWC data set is not without controversy. Several researchers have 
noted that variations in analysis procedures, as well as changes in observational 
tools (satellite, aircraft, etc.) over the years, may compromise the overall 
consistency of the best-track [as written can be confusing] records. Furthermore, 
Wu et al. (2006) cite notable differences in the track information from JTWC vice 
what is available from the Regional Specialized Meteorological Centre Tokyo; 
among the reasons for the discrepancies are differences in the time period over 
which winds were averaged, and differences in each center’s intensity-estimation 
techniques. However, efforts have been made, and are continuing, to minimize 
the discrepancies within the JTWC best track archive and between the JTWC 
archive and other sources for historical TC information (Chu et al. 2002). We 
determined that the potential problems with the JTWC best track data were not 
likely to significantly influence our study results. 

2. NCEP Reanalyses 

Global objective analyses that assimilate numerous observational data 
sources with model output and span many years provide an increased ability to 
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investigate the physical processes that surround TC development. Prior to the 
introduction of these so-called reanalyses, it was difficult to consistently 
investigate subtle variations in the climate system. We used two reanalysis data 
sets: (1) the NCEP/National Center for Atmospheric Research (NCAR) 
Reanalysis Projects (Kalnay et al. 1996; Kistler et al. 2001); and (2) the 
NCEP/Department of Energy (DOE) Atmospheric Model Intercomparison 
Project-ll (AMIP-II) Reanalysis (Kanamitsu etal. 2002). 

The NCEP/NCAR Reanalysis Projects data set (hereafter referred to as 
R1), and the NCEP/DOE AMIP-II Reanalysis data set (hereafter referred to as 
R2) are both based on assimilating data using a fixed model at T62L28 
resolution. Though both reanalyses use the same raw observational data, the 
R2 project attempts to correct some of the known errors in the R1 reanalysis; 
please review the cited publications for more details. 

Though other variables were tested, the final atmospheric variables used 
in the construction of our regression model (see Section III.A.) are all manually 
derived from “A” variables. Kalnay et al. (1996) note that an “A indicates that the 
analysis variable is strongly influenced by observed data and, hence, it is in the 
most reliable class.” 

For the purposes of this study, we used daily mean fields interpolated to a 
2.5° global grid. R2 was the primary dataset from which variables were derived, 
but R1 data was used in this research for verification and as a way to test the 
model’s sensitivity to a specific reanalysis system. 

3. NOAA OISST 

Just as the atmospheric reanalyses are invaluable tools in developing 
empirical prediction methods, so too is a quality database of SSTs. The SST 
data used in developing our statistical model is the National Oceanic and 
Atmospheric Administration (NOAA) optimum interpolation (01) SST analysis 
version 2. SST values from this dataset are available in weekly means from 
1981 to present, at one degree latitude by one degree longitude horizontal 
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resolution on a global grid. OISST data combine in situ and satellite-derived SST 
measurements with biases adjustments (Reynolds et al. 2002). 

In order to match our R1 and R2 data, the OISST fields were extrapolated 
from one degree resolution to 2.5° horizontal resolution and interpolated to daily 
values. 

4. NCEP CFS 

The NCEP Climate Forecast System (CFS) is a one-tier fully coupled 
ocean-land-atmosphere dynamical assimilation and prediction system, which has 
been operational at NCEP since August 2004 (Saha et al. 2006). The 
atmospheric component of this coupled system is a reduced-resolution version of 
the more-familiar 2003 operational NCEP Global Forecast System (GFS), with 
T62L64 resolution (equivalent to -200 km Gaussian grid); the initial conditions 
are obtained from the operational R2 (Saha 2008). This atmospheric component 
is coupled once per day, without flux correction, with the Geophysical Fluid 
Dynamics Laboratory (GFDL) Modular Ccean Model version 3 (MGM3). Four 
CFS runs are executed daily, with integrations out to nine months. Cf the two 
runs initialized at OOZ and at 12Z, each has the same initial oceanic state, but a 
slightly perturbed atmospheric state. The initial conditions for these runs are one 
day old for both the atmospheric and oceanic variables (Saha 2008). 

Cne appealing feature of the CFS is the availability of hindcast and bias 
correction fields. As noted in Section I.B.3.b., GCMs are often plagued by 
systematic errors. We are able to remove much of this systematic error, namely 
climate drift, by employing the forecast climatology that is available for all 
forecast lead times and the daily observed climatology. Such corrective fields 
are only available for a subset of variables. 

From the available fields, in gridded binary format, we manually extracted 
daily SSTs at one-degree global coverage, daily atmospheric variables converted 
from their native Gaussian grid to a 2.5° latitude/longitude grid, and the 
appropriate bias correction fields. Once the variables were bias corrected and on 
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the same latitude/longitude grid, we used the SSTs and atmospheric variables to 
force a statistical model to provide a probability of TC formation at every grid 
point. 

With numerous GCMs being used in climate science, one may wonder 
why we chose the CFS. In addition to being freely and publically available, the 
CFS is the first operational, dynamical model with predictive skill on par with 
statistical methods used at CPC (Saha et al. 2006). Saha et al. (2006) also 
notes that the “Niho-3.4 SST is probably the single most predictable entity [within 
the CFS].” Though the Niho-3.4 region is just outside of our WNP study region, 
we were motivated by the relative high skill of the CFS in the Pacific basin, 
especially since prior studies have shown that SST variability in the Niho-3.4 
region is closely related to variations in the large scale environmental factors that 
influence TC formations in the WNP (Ford 2000; Chan 2004). In addition to the 
perceived skill, the CFS also offers a rudimentary ensemble construct. With two 
runs executed twice daily, the potential exists for a four-run ensemble with 
perturbed initial conditions. One could increase the number of ensemble 
members by incorporating runs from other days as well. The intention for the 
ensemble approach is to smooth out the differences between the runs in order to 
bring out the more predictable elements and, thereby, lead to enhanced 
predictive skill on average. 

C. VARIABLES OF INTEREST 

The existence of a set of large-scale environmental factors (LSEFs) that 
are influential in TC formation has been well documented over the last half 
century. Gray (1968, 1975, 1979) outlined a physical climatology of tropical 
cyclogenesis relative to six, so-called genesis parameters. Other authors vary 
the list of these parameters, or factors, slightly and at times condense the list 
(e.g., low-level relative vorticity and the Coriolis parameter may be combined into 
a single absolute vorticity term). Regardless of the specific list of LSEFs one 
chooses, the physical properties are arguably quite similar. 
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The LSEFs may each be necessary for tropical cyclogenesis, but a 
combination of these parameters alone may not be sufficient to diagnose or 
predict the transition from a convective disturbance into an organized TC (Frank 
1987). This idea of “necessary but not sufficient” suggests that the large-scale 
environment may not be solely responsible for determining whether a TC forms 
or not. Frank (2006) contends that “individual storms form infrequently and 
sporadically within large areas of favorable environmental conditions due to the 
effects of local flow perturbations.” Such a mesoscale trigger and/or perturbation 
in the local flow may be required to instigate tropical cyclogenesis, but abundant 
research supports the profound role of large-scale external forcing as a 
determining factor in tropical cyclogenesis (Briegel and Frank 1997). 

An underlying goal of this study is to predict favorable regions for tropical 
cyclogenesis at intraseasonal lead times, by way of forcing a statistical model 
with available output from a GCM. In our case, we use the NCEP CFS as our 
dynamical GCM, from which not all the LSEFs are available. To remedy this, we 
had to accomplish two tasks. First, we had to consider other parameters that are 
similar to the LSEFs described in the literature and may represent the same 
processes, and for which intraseasonal forecasts are readily available from the 
CFS. Second, we had to calculate additional variables based on available model 
output fields. For variables requiring spatial derivatives, we employed second 
order centered finite differencing; see Appendix B for more information regarding 
the calculation of additional variables from available fields. 

The genesis parameters, as proposed by Gray (1975), can be subdivided 
into thermodynamic and dynamic parameters. What follows is a brief look at 
these parameters, as well as some of the additional variables we considered. 
For the sake of brevity, not all of the variables that we investigated in our 
research are presented here. For more information on LSEFs and how they 
relate to TC development, the reader is directed to any of the plethora of articles 
and books on the subject (e.g.. Gray 1975; Frank 1987). 
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1. Thermodynamic Parameters 

Research suggests that sufficiently high SSTs and moisture in the mid¬ 
troposphere are important for only for TC formation, but also for tropical deep 
convection. These thermodynamic variables are often favorable for TC 
development over much of the tropical Pacific during much of the year (see 
Chapter III). 


a. Sea Surface Temperatures 

Frank (1987) contends that the high frequency of TC formation in 
the WNP, as compared to other ocean basins, is due, in part, to an expansive 
area of warm water (e.g., water warmer than 26°C). 
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Figure 8. Average June - November SST (in °C) conditions over the WNP for 
the period 1982-2000, plotted from NOAA OISST data interpolated to 2.5° 

horizontal resolution. 


Figure 8 depicts the average SST conditions over the WNP during 
the peak formation season. Such a large expanse of warm water has led some 
researchers to conclude that SSTs may not be a primary factor affecting 
formation in the tropical Pacific (Chan 2004), as the temperatures are often 
sufficiently warm (e.g., greater than 26°C). 
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Sea Surface Temperature Boxplot 



Figure 9. Box plots of grid values of SST (in °C) grouped according to 
whether a TC formed in that day-grid box (indicated by Yes on the 
horizontal axis) or did not form in that box (indicated by No). The box 
height encompasses 50% of the SST data points, the whiskers (dashed 
lines) extend to include -99% of the SST data points, and SST data that 
fell outside the whiskers (outliers) are indicated by the red “+” symbols. 

Constructed from NOAA OISST data with TC occurrences from the JTWC 
best track archive for the January-December period of 1982-2006. 


The Box plots in Figure 9 separate the SSTs at 2.5° latitude x 2.5° 
longitude by day grid blocks according to whether a TC formed in the grid block 
(“Yes”) or not (“No”). The comparatively constrained appearance of the “Yes” 
boxplot indicates that TCs seem to form in conjunction with a small range of 
SSTs in the upper 20s and low 30s degrees Celsius. Numerous sources, such 
as Frank (2006) and Meyer (2007), note that SSTs must meet or exceed 26.5°C 
to favorably support TC formation. These Box plots support that, and suggest 
that a threshold for the WNP may be even more restrictive (i.e., ^ 28°C). 
Physical reasoning and the sort of relationships shown in Figure 9 indicate that 
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SST and TC formation probability at a given location should be directly and 
positively related to each other, if all other factors that influence TC formation are 
favorable and held constant. 

b. Humidity 

Early research on the climatologies of WNP TCs indicates that TCs 
only form in regions where seasonally averaged values of mid-tropospheric 
moisture are high. The physical explanation is that moist air in the middle 
troposphere is more conducive to deep convection and vertical coupling of the 
atmosphere (Gray 1975). 
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Figure 10. Average June - November a) 500mb relative humidity (in %) and b) 
precipitable water (in kg m'^) conditions over the WNP for the period 
1971-2000, plotted from R1 data. 

Mid-tropospheric humidity variables are not available from the CFS. 
So we looked to precipitable water as a viable alternative to represent the 
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available environmental moisture. Figure 10 shows average conditions during 
the peak formation period of mid-level relative humidity and total-column 
precipitable water. Though the units are not directly comparable, one should 
note the spatial agreement of the location of high humidities to the location of 
greatest precipitable water. As with SST, physical reasoning indicates that, all 
other factors being favorable and constant, an increase in atmospheric moisture 
content should lead to an increase in the probability of TC formation. We 
confirmed this with moisture-TC formation Box plots (not shown) similar to those 
in Figure 9. 

2. Dynamic Parameters 

As noted earlier, favorable thermodynamic conditions are often present 
over expansive swaths of the WNP much of the year; therefore, dynamic 
parameters are thought to be responsible for determining whether a TC will form 
in a region that is thermodynamically favorable for TC formation. Gray (1975) 
noted the comparatively small spatial and temporal scales over which a 
disturbance will interact with its surrounding dynamic environment. These subtle 
interactions at smaller scales provide the motivation to use data at 2.5° resolution 
and daily time steps for this study, versus the previous work by Meyer (2007) that 
used 5° data at weekly time steps. 

a. Shear 

Numerous studies have found that large values of vertical wind 
shear in the large-scale environment tend to suppress TC formations. Though 
various definitions exist in literature, the most common measure of vertical wind 
shear is the mean vector wind at 850 mb subtracted from the mean vector wind 
at 200 mb. Such a calculation results in a magnitude and direction, though the 
magnitude alone is used in this work. Near the monsoon trough axis, vertical 
wind shear is minimal, allowing deep convection to be sustained and increasing 
the likelihood of TC formation in the region (Chan 2004). 
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Figure 11. Average June - November magnitude of vertical wind shear (in m 
s'^) over the WNP for the period 1971-2000, derived from R1 data. 

Figure 11 displays the mean magnitude of vertical wind shear over 
the WNP during the peak formation season. The reader should note the co- 
location of the low mean shear pattern (Figure 11), the climatological monsoon 
trough (Figure 7), and the highest climatological probabilities of formation (Figure 
6). The Box plots in Figure 12 solidify the relationship between the magnitude of 
vertical wind shear and probability of TC formation—TCs form in regions of low 
environmental wind shear. 
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Shear Doxplot 
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Figure 12. Box plots of grid values of the magnitude of the mean vertical wind 
shear (in m s'^) grouped according to whether a TC formed in that day-grid 
box. The box height encompasses 50% of the data points, the whiskers 
(dashed lines) extend to include -99% of the data points, and points that 
fall outside the whiskers (outliers) are indicated by the red “+” symbols. 
Constructed from R2 data with TC occurrences from the JTWC best track 
archive for the January-December period of 1982-2006. 


b. Upward Vertical Motion/Velocity 

During the peak formation season in the WNP, warm waters lie just 
to the west of the tropical upper tropospheric trough (TUTT) and near the 
entrance region of the climatological tropical easterly jet. Both features 
contribute to regions of upper-level divergence and/or persistent upward vertical 
motion, both shown to be favorable for cyclogenesis (Frank 1987). 
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Figure 13. Average June - November a) 500mb omega (in Pa s'^) and b) 
200mb divergence (in s'^) conditions over the WNP for the period 1971- 

2000, derived from R1 data. 

Just as with the moisture variables, the availability of variables from 
the CFS influenced our choice of the variables to use to represent vertical motion 
in our statistical model. A variable directly representing vertical motion is not 
readily available from the CFS at daily time steps, thus we opted to test 200 mb 
divergence (calculated from the 200 mb zonal and meridional wind fields; see 
Appendix B for more information regarding these calculations). Figure 13 depicts 
the peak season averages of 500 mb omega and 200 mb divergence. 
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Variable Sensitivity 



Figure 14. Normalized January-December 500 mb omega vs. 200 mb 
divergence scatter plot, displaying sensitivity between the variables, 
constructed from R2 data for the period 1982-2006. 

Though the spatial patterns in Figure 13 suggest that upper-level 
divergence may be a suitable alternative to the more-traditional omega, we 
sought to test the sensitivity of these two variables. Figure 14 is a scatter plot of 
normalized divergence versus omega. Knowing the opposing sign conventions, 
the negative slope to the elongated cluster suggests that the variables are 
reasonably correlated, and that divergence may be a suitable replacement for 
omega. The box plots in Figure 15 indicate that TCs form in the WNP when 200 
mb divergence is weak, but skewed towards divergent outflow aloft. 
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200hPa Divergence Boxplot 



TC Occurrence 


Figure 15. Box plots of grid values of upper-level divergence (in s'^) grouped 
according to whether a TC formed in that day-grid box. The box height 
encompasses 50% of the data points, the whiskers (dashed lines) extend 
to include -99% of the data points, and points that fall outside the 
whiskers (outliers) are indicated by the red “+” symbols. Constructed from 
R2 data with TC occurrences from the JTWC best track archive for the 
January-December period of 1982-2006. 


c. Vorticity 

The final genesis parameter is vorticity in the lower troposphere. 
As their behavior is different, we chose to investigate relative vorticity and 
planetary vorticity—as represented by the Coriolis parameter f—separately, as 
well as combined into a single low-level absolute vorticity term. Frank (1987) 
notes that relative vorticity may result from several sources, including from the 
intensification of monsoon trough circulations, waves in the easterlies, or along 
frontal zones that extend into the tropics. 
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Figure 16. Average June - November 850 mb relative vorticity (in s'^) 
conditions over the WNP for the period 1971-2000, derived from R1 data. 

The spatial pattern of Figure 16 should be familiar to the reader by 
this point, with the greatest average values of 850 mb relative vorticity in spatial 
agreement with the monsoon trough figure described earlier in this chapter. Of 
the three mechanisms noted by Frank that intensify relative vorticity, only the 
monsoon trough is persistent enough to be clearly represented in this six-month 
composite. 
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Figure 17. Box plots of grid values of low-level relative vorticity (in s'^) grouped 
according to whether a TC formed in that day-grid box. The box height 
encompasses 50% of the data points, the whiskers (dashed lines) extend 
to include -99% of the data points, and points that fall outside the 
whiskers (outliers) are indicated by the red “+” symbols. Constructed from 
R2 data with TC occurrences from the JTWC best track archive for the 
January-December period of 1982-2006. 


The box plots in Figure 17 support what many previous authors 
have found, that weak to positive low-level relative vorticity relates to an increase 
in TC formation probability. Not shown are similar sets of plots for planetary 
vorticity and absolute vorticity. In agreement with previous studies, we find that 
Coriolis parameter has a positive relationship with TC formation, and that the 
vast majority of TCs form several degrees or more from the equator. 

3. Model Variable Selection 

Several of the variables that are either directly available from the CFS or 
are easily derived from CFS output represent similar large-scale environmental 
conditions and processes. Thus, to represent, for example, vertical motion, we 
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had to choose between 200 mb divergence and 200 mb relative vorticity, since 
both of these variables represent vertical motion (the former does so explicitly 
and the latter does so implicitly). In making this sort of choice, we favored 
variables that: 

1) have physically plausible relationships to TC formation, 

2) are readily available directly from the CFS, 

3) or are easily derivable from available CFS variables, and 

4) are relatively skillfully predicted by the CFS. 

4. Climate Oscillations and Model Variable Relationships 

Numerous prior studies describe the intraseasonal and interannual 
variability of TC formation, especially as they relate to climate oscillations (e.g.. 
Ford 2000; Chan 2004). Of the climate oscillations that impact TC activity, the 
most often investigated are El Nino and La Nina (ENLN). ENLN are anomalous 
oscillations of the tropical atmosphere and ocean that can alter the large-scale 
environment in ways that influence TC formations, intensities, and tracks (e.g.. 
Ford 2000). Wang and Chan (2002) offer a good illustration of how ENLN can 
influence TC activity. They note that during the latter months of an El Nino year, 
low-level anomalous westerlies encompass much of the WNP. These 
anomalous winds lead to positive relative vorticity anomalies in the region, which 
provide a favorable environment for TC formation that is both later in the year 
than normal, and displaced farther to the east. 

In addition to ENLN, much focus has been directed at the influence of 
intraseasonal tropical oscillations; for example, investigations into the effects of 
the MJO on TC formation in the WNP. Frank and Roundy (2006) show that 
when MJO activity is high, TCs are more likely to form in the convectively active 
portions of the MJOs. As with ENLN, it is likely that the impacts of intraseasonal 
variations on TCs occur mainly via alterations of the LSEFs. 
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Though climate oscillations and their impacts on TC activity are beyond 
the scope of this thesis, these brief notes are included because of their 
relationship to the subject of this thesis: intraseasonal prediction of tropical 
cyclogenesis. With changes in the large-scale circulation in the tropics, the 
thermodynamic and/or dynamic genesis parameters may be modified; these 
modifications, in turn, alter the TC activity. The idea is that if oscillations (ENLN, 
MJO, etc.) that have been shown to impact TC formation are skillfully predicted 
by the CFS, including the variations in the LSEFs associated with these climate 
oscillations, then a statistical-dynamical method based on the relationships 
between the LSEFs and TC formations can be skillful regardless of the oscillatory 
state of the climate system. 


D. PROBABILISTIC EQUATION DEVELOPMENT 
1. Logistic Regression 

Logistic regression, also referred to as logit regression, is an appropriate 
statistical tool for this application. Given a combination of independent variables, 
logistic regression provides the probability of occurrence of the dependant, binary 
variable. Let be the probability of TC formation at a given grid point for a 

given time period (one day, in this study); since is a probability it is bounded 
by zero and one. 


The natural logarithm of the odds ratio of the probability is called logit, 

where: 


Logit = In 


Pf 


^-Pf 


We used the statistical analysis software S-Plus to find the optimal values 
of the intercept b^and the coefficients for each contributing variable x,^, such 
that: 


In 


^ Pf ^ 


l-p 


= bo +b,x, +... + b,x, 
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Then the probability of TC formation may be calculated based on a linear 
combination of the optimal value coefficients and explanatory variables: 

g(ba+hx^+...+btX^) 

Pf ^{ba+b^x^+...+\x^) 

For more information regarding logistic regression, the reader is directed 
towards Wilks (2006), Devore (200), or most any college-level statistics text. 

a. Dependant Variable 

Within the framework of logistic regression, TC formation at a given 
grid point is modeled as a binary response variable and is expressed as either 
zero (no formation) or one (formation observed). As such, this approach 
provides the model with no information as to the strength or duration of the 
storm. We feel that this approach remains viable despite this limitation. McBride 
(1981) comments, with respect to compositing data, that “the averaging process 
smears out the diversity between different systems and enhances features in 
common” As such, we hope our method is applicable over more scenarios, as a 
result of including a wide variety of storms in the training of the regression model. 

b. Independent Variables 

Technically, the approach we are using is multivariate logistic 
regression, as we are allowing multiple independent, or explanatory, variables to 
contribute to the probability. Ideally, all the independent variables in a multiple 
logistic regression analysis, would be just that, independent. As noted earlier, 
the LSEFs are inter-related in a linked ocean-atmosphere system, thus the 
variables will all have some degree of correlation with each other. This lack of 
true independence will allow combinations of variables to negate the need for 
others. For example, high relative humidity often occurs in regions of warm SST, 
positive low-level vorticity, and upward vertical motion. Therefore, if the latter 
three variables favorably exist, the addition of a humidity variable may not be 
required to ascertain the favorability of the large-scale environment. 
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2. Model Training 

Statistical methods, such as logistic regression, predict the response to 
variables based on a historical record; therefore, one must reach a balance 
between the length and the quality of the climate record. For this reason, we 
utilize data only from the satellite era. In developing such statistical tools, one 
must also assume a degree of stationarity of the climate system, which we know 
is not entirely the case. 

We used R2 and OISST data to train our statistical model; the availability 
of both of those datasets limited us to the years 1982 - 2006, inclusive. Various 
forms of the model were tested, some of which were trained over the entire year, 
others were trained over just the peak formation season. 

When a model is trained over all months for the 25-year period, the size of 
the dataset becomes somewhat cumbersome [13 (latitude grid boxes in WNP) x 
37 (longitude grid boxes) x 365 (days, excluding leap days) x 25 (years) = 
4,389,125 day grid points per variable!]. One approach for the reducing the 
needed dataset is to include all the points wherein a TC was observed, but only 
include a portion of the remaining “non-occurrence” points. We refer to the data 
from all the day grid points at which a TC was not observed as non-TC 
information (NTCI). Various forms of the model were tested using various 
amounts (as percentages) of NTCI. 

3. Model Selection 

We made use of a series of tests to ensure our model was statistically 
sound and to assess the overall goodness of fit. Cur goal was to develop a 
model that is physically defensible, stable, and reliable. The following tools are 
among those we relied upon to select our model from the numerous forms of the 
model that we tested. 
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a. 


Akaike Information Criterion 


The Akaike Information Criterion (AlC) is a goodness-of-fit measure 
that seeks to find a balance between model fit and complexity. The model 
complexity is handled by imposing a penalty for the number of terms included in 
the equation. A lower AlC suggests a better-fitting model. Refer to Wilks (2006) 
or Burnham and Anderson (2002) for more information concerning AlC. 

b. Deviance 

We also used the residual deviance numbers to compare models. 
In a simplistic manner, the amount of deviance explained by a model suggests 
how much of the variability is accounted for by the combination of the included 
terms of the model. The logic for this test being that the greater the goodness of 
fit of a model, the lower the residual deviance associated with that model. 

c. Stability 

To assess whether our model contains too many explanatory 
variables, often referred to in statistics as being overfit, we examined how much 
the variable coefficients vary when the model is constructed over different 
training periods. A model is said to be more stable, and having a lower 
probability of being overfit, the less the coefficients vary when derived from 
different training periods. 

d. Physical Plausibility 

A viable model must indicate relationships that fit the conceptual 
models identified in prior studies and noted in the section on LSEFs. For 
example, we expect SSTs to have a positive relationship with the likelihood of TC 
formation. A model that suggests a negative relationship between SST and the 
likelihood of TC formation would be suspect. In our research, we encountered 
models that suggested humidity and the probability of TC formation are inversely 
related; such a negative coefficient is not physically defensible and likely results 
from multicollinearity between the LSEFs included in the model. This 
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multicollinearity may result from a lack of independence among predictor 
variables, in this case between humidity and the other variables included in the 
regression equation (e.g., SST, divergence; see also Devore 2000). 

4. Model Verification 

We made use of several key metrics for assessing the skill and value of 
our model when the model was used to conduct multi-year zero-lead hindcasting 
and non-zero lead hindcasting case studies. Such metrics include the number of 
hits and misses, the Brier score (BS) and Brier skill score (BSS), the reliability 
diagram, the relative operating characteristic (ROC) curve, and the economic 
value diagram (EVD). 

5. Motivations for a Probabiiistic Forecast 

Among the reasons for selecting multivariate logistic regression as the 
statistical tool by which to develop a statistical-dynamical prediction method are 
the potential benefits of producing probabilistic forecasts. In order to reap these 
benefits, the probabilities must represent true probabilities. The probabilities may 
not be true probabilities if the model is ill constructed. Among the potential 
benefits of probabilistic forecasts, is that customers may use the true probabilities 
to compare to the risk profile of a given mission and, thereby adjust their decision 
making. Also, such probabilistic output allows for a relatively straightforward 
conversion to anomaly-type forecasts that may be useful deliverable for many 
decision makers. 

E. SUMMARY OF PREDICTION METHOD 

Figure 18 is a schematic of the process involved in creating and 
operationalizing the prediction process used in this thesis. This process is a 
combined statistical-dynamical one, wherein one uses a numerical model to force 
a statistical model to generate ensemble based, probabilistic, intraseasonal 
predictions of TC formations. 
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Figure 18. Depiction of the process for generating intraseasonal predictions of 

tropical cyclogenesis. 
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III. RESULTS 


A. REGRESSION MODEL 

The underlying goal in generating a regression model, is to construct an 
equation for the probability of TC formation for individual day grid points based 
on the values of corresponding atmospheric and oceanic variables. Multivariate 
logistic regression was used to find optimal values of the intercept Z7„and the 

coefficients for each contributing variable , such that the probability of TC 
formation pp at any given day grid point is: 

g(fco+*l^l+-+V6) 

Table 1 below lists the variables and coefficients that are included in the 
model we chose. The paragraphs that follow highlight some of key details as to 
how this model was constructed and why it was chosen from amongst the many 
model permutations tested. 

Table 1. Variable coefficients and related statistics, generated over a June- 
November training period for the years 1982-2006. 


Variable 

Regression 

Coefficient 

Significance 

Rank 

Standard 

Error 

t Value 

- 

(Intercept) 

bo 

-27.41179 

- 

1.81639 

-15.09 

X] 

850mb Rel. Vorticity 

bi 

167645.1 

1 

7074.82 

23.69 

X2 

850mb Rel. Vorticity"^ 

bi 

-1679802094.0 

2 

112033900 

-14.99 

X3 

SST 

bs 

0.6567593 

3 

0.06061 

10.83 

X4 

Vertical Wind Shear 

b4 

-0.05990173 

4 

0.00687 

-8.71 

X5 

Coriolis Parameter 

bs 

15861.34 

5 

2646.58 

5.99 

X6 

200 mb Divergence 

bo 

24729.49 

6 

6152.83 

4.01 


As one can see, the variables selected for inclusion in the model include: 
850 mb relative vorticity, 850 mb relative vorticity squared, SST, vertical wind 
shear, Coriolis, and 200 mb divergence. The magnitude of the regression 
coefficients are not indicative of the relative importance of that term, but are 
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reflections of the units of the variable. In addition, the units of the coefficients are 
the inverse of the units of the associated variable, thus the linear combination of 
the variables and their coefficients is unitless. One may also note what is not 
included in this equation that appears in the original listing of genesis parameters 
by Gray (1975), that being a term representing mid-level humidity. The results 
from statistical testing indicated a significant degree of multicollinearity between 
such a moisture variable and the other terms of the equation. The exclusion of a 
moisture variable is not to say it is not important for the formation of TCs, but 
rather that the combination of the other variables (cyclonic low-level circulation 
over warm ocean water, etc.) act as a suitable proxy for a moisture variable. 

Of the included variables, SST is the only one directly available from the 
CFS. The Coriolis parameter is a function of latitude, and thus requires no model 
input. The remaining variables are all calculated from the 200 mb and 850 mb 
zonal and meridional winds, which are available from the CFS. Despite the need 
for these calculations, we feel that these variables are likely more predictable 
within the CFS than other variables that are more dependent upon 
parameterizations. For example, a variable for precipitation rate within the CFS 
would be highly dependent upon the convective parameterization scheme; 
whereas, the upper-level component winds are based more on observational 
data assimilated directly into the model and integrated via the primitive 
equations. 

As aforementioned, a key factor in selecting a regression model is to 
ensure physically plausibility. All the included variables have been shown to be 
influential, or are known to be closely related to variables that have been shown 
influential, in tropical cyclogenesis. In addition, the sign on each coefficient fits 
the conceptual model of that variable’s relationship to TC formation. More 
specifically, low-level relative vorticity, Coriolis parameter, SST, and upper-level 
divergence each have positive coefficients, and an increase in one or all of those 
variables translates into a more favorable environment for TC formation. The 
negative coefficient on the vertical wind shear term indicates that the lower the 
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vertical wind shear, the more favorable TC formation. The negative coefficient 
on the squared vorticity term is plausible as well, as described later in this 
section. 

The significance ranks listed in Table 1 are based on the probability that 
the given term is not significant to the performance of our model per the Chi- 
squared test. These rankings may be interpreted as indicating that 850 mb 
relative vorticity has the lowest probability of not being significant to our model, 
and thus may be viewed as the most statistically influential component of the 
model. All of the terms included in this model are statistically significant to the 
regression model; therefore, even though the 200 mb divergence term has the 
lowest significance rank, it is still a significant contributor to the model. 

An issue that plagued the development of this model was the persistence 
of storms after the formation day. In developing the model, we assigned a hit, or 
occurrence value of one, to the day grid point at which the JTWC best track data 
placed the formation point for each given storm. As the LSEFs appeared to vary 
little from the day of formation to the days immediately surrounding the formation 
day, the regression model was forced to discern the difference in the LSEFs 
between those days, in essence asking why was one day a hit and the following 
day—with nearly identical LSEFs—a non-occurrence point? To make matters 
worse, the R2 data used in the training of the models often depicts the storm 
tracks well. Therefore, the LSEFs following the JTWC formation date were often 
more favorable than on the formation date. This is especially true for the 
dynamic variables. As a result, we needed a way to focus the model in on the 
day of formation and introduce a variable or mechanism to the model to identify 
when a well-developed storm is being depicted by the assimilated reanalysis 
fields. 

We adopted three modifications in the construction of this model to focus 
on the formation day and to reduce the model predicted probabilities associated 
with storms that have already formed. First, we adopted a mean sea level 

pressure (MSLP) filter. Before including a non-occurrence day grid block into the 
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regression, we filtered out blocks for which the MSLP was less than 990 mb. 
Second, we reduced the NTCI to 40%; this allowed us to randomly eliminate 
some blocks associated with storms that have already formed, while still retaining 
784,660 non-occurrence points in the development of the regression model. 
Third, we added the squared 850 mb relative vorticity term. 

The squared 850 mb relative vorticity term forces a non-linear response to 
the 850 mb relative vorticity in the generalized linear model. Of all the LSEFs, 
the low-level vorticity appears to change the most through the life of a TC. In 
order to focus our model in the formation day—rather than when a storm is a 
well-developed circulation center—we included this vorticity squared term into 
the regression model. With its negative coefficient, this term acts to decrease the 
probability of TC formation as the relative vorticity increases. In essence, we are 
attempting to decrease the likelihood of formation in regions where a TC already 
exists. 

ether variables considered for inclusion, but not appearing in the final 
form of the model include, but are not limited to, 200 mb relative vorticity, 
thickness, MSLP, precipitable water, and 850 mb divergence. We also 
entertained the inclusion of combinations of several variables, such as absolute 
vorticity rather than relative vorticity and the Coriolis parameter separately, and a 
combined upper-level minus low-level divergence term. 

The final form of the model outlined in Table 1 was trained only on the 
peak formation period, June through November, for the years 1982 through 
2006. To evaluate the stability of the model (see Section II.D.S.c), we developed 
the regression equation several times, each time excluding one year from the 
training period. Table 2 lists the coefficients from some of these runs. The 
variations in the coefficients are minor; therefore, we concluded that our model is 
stable and not overfit. Excluding years also provided us with years of 
independent data (years over which the model was not trained) with which to 
conduct additional verification. 
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Table 2. Comparison of regression coefficients for models with altered training 
periods. The training period for the full model is all years during 1982- 

2006. 


Variable 

Full Model 

Excluding 1982 

Excluding 1997 

Excluding 2001 

Excluding 2006 

(Intercept) 

-27.41179 

-27.51922 

-27.44453 

-27.40436 

-27.17824 

Rel. Vorticity 

167645.1 

164535.9 

165662.7 

166313 

171080.2 

Rel. Vorticity^ 

-1679802094 

-1641682993 

-1638789695 

-1668518817 

-1770825439 

SST 

0.6567593 

0.661396 

0.6575657 

0.6569342 

0.6482437 

Shear 

-0.05990173 

-0.06006378 

-0.05951317 

-0.05778334 

-0.05813202 

Coriolis 

15861.34 

16133.2 

16637.09 

15938.68 

16204.37 

Divergence 

24729.49 

26888.63 

24222.44 

23897.41 

23509.69 


This model was trained on data with daily temporal resolution, which 
poses two potential challenges. As TCs are rare events—626 formations from 
among 785,286 day grid blocks in the training period—the daily probability of TC 
formation is incredibly low. This is true even for the most favorable locations 
(i.e., the climatological position of the monsoon trough) and times of year. For 
example, daily probabilities during the height of the peak formation period in 
favorable regions seldom exceed 0.05, or 5%. Such low probabilities—even if 
the probabilities are reliable—may be a challenge for forecasters and operators 
to interpret. In addition, at daily scales, the predictability at intraseasonal leads of 
the variables included within this model tends to be low. 
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Figure 19. Example of contoured, seven-day summed probabilities, centered 
about the 264th day (21 September) of 2001, constructed from R2 and 
OISST fields using the model described above. The red dot indicates the 
verification point for a TC that formed on 21 September 2001. 

In order to address these problems concerning daily probabilities, we 
investigated non-native versions of the probabilistic output. The version upon 
which we settled was a summed seven-day probability. When using TC 
formations to verify this model output, we compared the TC formation date and 
location to the sum of the output probabilities for the seven days centered on the 
formation day: the three days prior to formation, the day of formation, and the 
three days following formation. Figure 19 is an example of seven-day summed 
probabilities from a hindcast valid 21 September 2001; the days summed to 
create this plot are 18 September through 24 September. The subsequent plot 
(not shown) would be valid on 22 September 2001, and be the summation of 19 
September to 25 September daily probabilities. The reasons for favoring this 
seven-day summation were threefold. First, the probabilities of formation at daily 
time steps are small due to the rarity of TC formation, so the summation 
increases the probabilities in active grid blocks to values that may be used in 
decision-making by users. Second, the daily output of summed seven-day 
probabilities should enhance the predictability within the model, as it reduces the 
potential impacts of timing error within the forecast fields, and provides a better fit 
with the time averaging approach that tends to enhance the skill of long lead 
forecasts. Third, this approach enhances the usefulness as a planning product; 
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for example, if operators are planning an intraseasonal lead times multi-day 
transits of the WNP, a multi-day probability forecast may be a better match to the 
planning process. The probabilities shown in the remainder of this thesis are 
seven-day summed probabilities, unless otherwise specified. 

B. VERIFICATION OF THE REGRESSION MODEL 

As depicted in Figure 1 of this thesis, Verification/Evaluation is a vital step 
in the climate prediction process. Such verification and evaluation is required for 
two primary reasons: to identify potential shortfalls or weaknesses that may be 
corrected be re-doing the model development stage, and to ascertain the 
potential skill and value the method offers potential users. 

In our verification of this regression model, we faced two complicating 
factors. First, we are actually predicting the favorability of the large-scale 
environment to support TC formations not formations themselves. This 
shortcoming returns to the idea that the LSEFs used in the model are necessary, 
but may not be sufficient, as noted in Section II.C. This complication arises when 
one uses actual TC occurrences to verify what are essentially forecasts for the 
propensity for TC formation based on environmental factors. The second 
complicating factor is that few techniques exist to verify spatial-distributed 
predictions of events that are as rare as TC formations. 

Other organizations that are delving into the realm of intraseasonal climate 
prediction appear to be struggling with verification as well. With no standard 
approach as to how to verify such predictions, we feel the best approach to 
verification is to use several methods in concert. 

1. Quantitative Verification 

The first class of verification we explored was quantitative verification. For 
the sake of brevity, we note only the key points for each quantitative verification 
technique. The reader is directed to references, such as Wilks (2006) or Eckel 
(2008), for additional details on the construction and interpretation of these 
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verification techniques. Paramount in quantitative verification is having sufficient 
forecasts to verify. In order to encompass a sufficient number of storms, the 
verification in this section is for multi-year zero-lead hindcasts over dependant 
data. The period of verification, unless otherwise specified, is the June through 
November peak formation period, as this matches the period over which the 
model was trained and reduces the potential for data dilution from the months 
when few storms develop. Over this period for the years 1982 to 2006, 626 
storms were identified by the JTWC in the region we define as the WNP, versus 
752 storms if the verification period is expanded to encompass every day of the 
year for the same years. So relatively few TCs were left out of the verification 
process when we limited ourselves to verifying using just June-November TCs. 

Many of the quantitative verification techniques that follow are based on 
dichotomous observation values; a value of one if the event is observed, or zero 
if the event is not observed. For our verifications, we opted to credit an observed 
value to any grid point that fell at or within a 2.5° radius of the JTWC formation 
point. We feel a 2.5° radius about the formation point accounts for the spatial 
influence of a forming TC, as well as accounts for some of the uncertainty in the 
formation location in the JTWC best track data. 

To provide us with a standardized measure of performance based on our 
predictions of the probability of formation, we used the Brier skill score (BSS). 
Cver the peak formation season, our model results in a BSS of 0.029055 
(0.028211 ...0.02994). The ranges included in the parentheses represent a 95% 
confidence interval, generated through jackknifing each of the years in the 
training period. Recall that positive values of the BSS represent improvement 
over the sample climatology baseline; thus, our model shows notable skill. When 
verifying the model over the full year, the BSS increases to 0.032555 
(0.031927...0.033182). Eckel (2008) notes that BSS is vulnerable to dataset 
dilution, which likely accounts for this increase in the skill score when verifying 
over the entire year. 
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Figure 20. Reliability diagram (left) and bin histogram (right) generated with 
minimum bin intervals of 0.005 for the zero-lead hindcasts, from the model 
outlined in Table 1 over the June - November period for 1982 to 2006; 
error bars represent a 95% confidence interval. 



Figure 21. The same reliability diagram as in Figure 20, but focused in on the 
lower probabilities; error bars represent a 95% confidence interval. 
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Figures 20 and 21 depict the reliability diagrams for zero-lead hindcasts 
with the model outlined in Table 1, for the peak formation seasons of the years 
1982 through 2006. The reliability diagram for a perfectly reliable model would 
lie along the diagonal indicated by the dashed line. Points within the region 
defined by the solid lines indicate positive skill. The figures show graphically 
what we learned from the BSS, that this model exhibits skill over the sample 
climatology baseline. The line connecting the results points is above the 
diagonal, indicating that the model slightly underforecasted TC formations. The 
sporadic behavior—as captured by the error bars—in the “higher” probabilities is 
likely due to the drop in number of points in those bins. From these reliability 
diagrams, we obtain an approximate BSS of 0.02852, reliability of 0.000065693, 
resolution of 0.00032856, and uncertainty of 0.0092171. 



Figure 22. ROC diagram for the zero-lead hindcasts over the peak formation 

season for the years 1982 to 2006. 

In addition to having skill, a worthwhile predictive method must also offer 
utility and value to the user. The relative operating characteristic (ROC) diagram 
and economic value diagram (EVD) are two graphical tools that one may use to 
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ascertain whether a method may offer such value and utility. Figure 22 shows 
the ROC diagram for the zero-lead hindcasts over the aforementioned 
verification period. A diagonal line (not shown) connecting (0,0) with (1,1) would 
represent zero resolution or no discrimination. Forecasts with better 
discrimination have ROC curves approaching the upper-left corner of the 
diagram (Wilks 2006). As a result, one can see that the model exhibits fair 
discrimination and offers potential utility to the user. Along with the ROC 
diagram, one may calculate a ROC skill score (ROCSS), which has a value of 
one for a perfect forecast and is less than zero if the forecast is worse than the 
sample climatology forecast. The ROCSS for these hindcasts is 0.68325. 



Figure 23. EVD for the zero-lead hindcasts over the peak formation season for 

the years 1982 to 2006. 

An EVD, as shown in Figure 23, plots value score versus cost/loss (C/L) 
ratio, and is a representation of the potential value added by following the 
forecast guidance for each customer (as defined by their C/L ratio). While initially 
one may not be impressed by the EVD in Figure 23 due to its skew, this EVD 
actually depicts significant potential value for risk adverse customers. Whether a 
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customer defines their C/L ratio in terms of dollars, sortie hours, or crew morale, 
most customers would be risk adverse (low C/L ratio) to a hit by a TC. A 
hypothetical example may be in order. Let us imagine a cruiser is steaming 
towards Subic Bay and the forecast calls for a TC; the captain can either divert 
around the storm at an additional cost of $100,000 above and beyond typical 
operating costs. Alternately, the captain may maintain course and if the cruiser is 
hit may suffer damages worth $1M in equipment and lost time. With these 
numbers this customer would have a C/L ratio of $100,000/$1,000,000 or 0.1, 
and thus should be highly risk adverse. For such a customer, the EVD indicates 
that the model has the potential to be very valuable in mission planning. While 
this example is grossly oversimplified, it reveals in the basic idea associated with 
the EVD and, thus, the potential benefits of this model. 

2. Qualitative Verification 

While qualitative verification is often not as definitive as quantitative 
verification, it does offer the advantage of allowing us to verify using purely 
independent data. Options for independent data include using runs generated 
with a year left of out of model development, then verifying over that excluded 
year, or using another variable source (such as R1 data for atmospheric 
variables). 


Probability: R2 2001 _236 
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Figure 24. Example of contoured, seven-day summed probabilities, centered 
about the 236th day (24 August) of 2001, constructed from R2 and OISST 
fields. The red dot indicates the verification point for a TC that formed on 

24 August 2001. 
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Figure 24 is a contour plot of probabilities for the period of 21-27 August 
2001. The model used in generating this plot was trained over a period that 
excluded the year 2001; therefore, this plot was generated with independent 
data. Plots such as that in Figure 24 indicate that the methodology proposed in 
this thesis may prove beneficial, as this zero-lead hindcast shows “high” 
probabilities that resemble those expected from reverse monsoon trough 
conditions that are very different from those that would be expected from typical 
monsoon trough climatological conditions in August. 


Probability: R2 2001_309 




Figure 25. Example of contoured, seven-day summed probabilities, centered 
about the 309th day (5 November) of 2001, constructed from R2 and 
OISST fields. The red dot indicates the verification point for a TC that 
formed on 5 November 2001. 


Figure 25 shows a zero-lead hindcast in which the pattern of model 
probabilities resembles the pattern that might be expected from climatological 
monsoon trough conditions. This figure represents situations in which the model 
proability patterns are similar to climatological patterns. But even when the 
model patterns resemble climatology, the model may add value by providing a 
more accurate prediction of the magnitude of the probabilities, as discussed in 
the next section. 
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3. Comparison to Climatology 

The preceding plots provide hope as to the potential usefulness of our 
proposed method for predicting TC formations. One may wonder how this 
method compares to climatology, but with climatology comes the question of 
what form of climatology is the best against which to compare our method. See 
Appendix A for a brief discussion on the various forms of climatology one may 
select. 

The idea of hits and misses is commonplace in verification, and one we 
shall use here. A simple subtraction of the climatological formation probability 
from the hindcast probability at every day grid block yields a difference matrix. 
Using the JTWC best track formation points, a hit (miss) is defined as occurring 
when the difference at the day grid block of formation is positive (negative). 
Scoring over the years 1982 through 2006, our model had 681 hits and 81 
misses, for a hit rate of 89%. 


Probability Difference: 2001 _236 
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Figure 26. Plot of the difference matrix resulting from subtracting climatological 
probabilities from hindcast probabilities centered on 24 August 2001, the 
same day used in Figure 24. Green dots denote the formation points for 
the four storms that formed with in the seven-day period of 21 - 27 August 

2001 . 


One may also plot this difference matrix; Figure 26 is an example of such 
a plot. Warm (cool) colors represent regions where the probabilities from the 
model are higher (lower) than the climatological probabilities. This approach is 
akin to an anomaly forecast, where the positive regions may be interpreted as 
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having a greater than normal likelihood of TC formation. The period of 21-27 
August 2001, which is depicted in Figures 24 and 26, was unusually active along 
20°N. This difference product highlights this activity, but it also indicates that the 
probability of formation along the climatological position of the monsoon trough is 
lower than climatology suggests. In some cases, knowing that formation is less 
likely in a region when compared to climatology may be just as beneficial as 
knowing that formation is more likely in some other region. 

4. Climate Oscillations 

Section II.C.4 briefly introduced the impacts of ENLN on TC formation. If 
our model accurately depicts the favorability of the large-scale environment, then 
it should depict a shift in the probabilities associated with the changes in the 
large scale environment that are associated with ENLN. 


El Nino JASO Composite Probabilities 
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La Nina JASO Composite Probabilities 



Figure 27. Average daily probabilities for the JASO period from composited El 
Nino years (top) and La Nina years (bottom). 
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Defining ENLN years based on the Oceanic Nino index (ONI), we can take 
1982, 1987, 1991, and 1997 as classic (non-Modoki) El Nino years and 1985, 
1988, 1999, and 2000 as La Nina years. Averaging the daily probabilities from 
the zero-lead hindcasts over the July, August, September, and October (JASO) 
period, yields the probability patterns shown in Figure 27. Note the shift in the 
highest probability regions between the two plots, these shifts are similar to those 
described in prior studies of the impacts of ENLN on TO formations (e.g.. Ford 
2000). For example, the high probabilities that extend farther to the east during 
the El Nino years are representative of the eastward shift of the regions of warm 
water, low-level cyclonic flow/convergence, and low vertical wind shear from their 
climatological positions. In contrast, slightly higher probabilities near the 
Maritime Continent in the bottom panel of Figure 27 are due to the westward shift 
of favorable LSEFs during La Nina years. 

5. Conditional Climatologies 

Another potential use for our model that emerged during this research was 
the possibility of creating conditional climatologies in the manner of constructed 
analogues. The underlying idea is that rather than generating a climatology plot 
based on the raw formation data, we could generate a plot based on model¬ 
generated probabilities. This approach could be as basic as generating an 
annual climatology based on LTM conditions, or as complex as conditioning 
based on time of year, ENLN, et cetera. 


JASO LTM Daily Probabilities 



Figure 28. Probabilities from LTM JASO R1 and OISST variables. The red 
dots indicate the formation points for all JASO TCs from 1971 - 2000. 
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Figure 28 depicts contours of daily probabilities for the JASO period, 
based on LTM R1 and LTM OISST LSEF values composited over the years 1971 
-2000 and 1982 - 2000, respectively. The period of 1971 - 2000 is used for this 
and other long-term mean conditions, as it represents the current World 
Meteorological Organization (WMO) standard, 30-year climatology period. This 
plot is not a perfect representation of the raw climatology; for example, the 
formation points clustered around 25°N and 165°E are not captured well by the 
contours despite the density of storms in that location. 


Constructed 7-Day Probs for June Entering Classic El Nino 
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Figure 29. Contoured, seven-day probabilities, constructed from R2 and 
OISST fields. The red dots indicate the verification point for TCs that 
formed during such conditions. 

The concept of a constructed analogue is combining past anomaly 
patterns such that the resulting combination reflects the desired state of the 
climate (van den Dool 2007). As an example, we constructed a probability plot 
for the month of June when the climate system is entering into an El Nino 
pattern. Using the ONI, such conditions were met during the years 1991, 1997, 
and 2002. Averaging the probabilities from our model for these three months 
(one month each for three years) and dividing to give us seven-day probabilities, 
results in what is depicted in Figure 29. In essence, the result is an improved 
representation of expected probabilities for a week in the month of June when an 
El Nino event is developing. 
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Though this method is not without limitation, it has a remarkable 
advantage in that this approach does not require dynamical input. As a result, 
this constructed analogue approach may be useful for providing tropical activity 
outlooks at extended lead times. 

6. Verification Against Deep Convection 

As noted in the beginning of the section on verification, many of the 
verification methods we have discussed thus far are problematic because they 
verify against observed TC formations, even though the model predicts the 
propensity for formation, not actual formations. Thus, we chose to also verify 
against outgoing longwave radiation (OLR), since low OLR values indicate deep 
convection and thus a large-scale environment that is likely to be favorable for 
TC formation. 


Probability: R2 2006_304 
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Figure 30. Comparison of zero-lead hindcast probabilities (top) and OLR 
(bottom). OLR image provided by Physical Sciences Division, Earth 
System Research Laboratory, NOAA, Boulder, Colorado, from their Web 
site at http.7/www.esrl.noaa.gov/psd/. 
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Figure 30 is presented as an example of model-derived probabilities for 28 
October through 3 November 2006 and the corresponding NOAA Interpolated 
OLR. Note in the tropics the general correspondence between the higher 
probabilities and the low OLR values (cool colors) that correspond to cold high 
cloud tops and deep convection. This sort of correspondence indicates that the 
model is capable of identifying deep convective regions that are favorable for TO 
formation, and has the potential to be useful in intraseasonal predictions of 
tropical convective activity. To operationalize such an approach for predicting 
convective activity, the Coriolis term should be removed from the regression 
model. 


7. Verification in Other Basins 

This final form of verification is one that tests whether the model truly 
represents a physically sound combination of LSEFs. Earlier authors presented 
their genesis-parameters as relevant to TC formations in all tropical ocean 
basins. As a result, one is left to wonder how the model, as described in Table 1, 
would perform on fully independent data in basins other than the WNP. Figure 
31 is an example of a probability plot that results when the Pacific-trained model 
is used to generate probabilities for the North Atlantic basin. 
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7 Day Cummulative Probabilities: 2006 Pacific Coefficients 233 




Figure 31. Example of contoured, seven-day summed probabilities over the 
Atlantic basin, centered about the 233rd day (21 August) of 2006, 
constructed from R2 and OISST fields using the model trained on the 
WNP. The black dot indicates the verification point for a TC that formed 

on 21 August 2006. 


Quantitative verification of the storms that developed into tropical storms 
or hurricanes in the Atlantic during the months of June through November and 
years 1982-2006 yields promising results. Over that period, hits number 273 and 
misses 16, with a BSS of 0.019476 (0.018584...0.020306) and a ROCSS of 
0.58959. These positive results suggest that the LSEFs that influence TC 
formation are the same regardless of the ocean basin. This cross-basin 
verification confirms what was proposed by authors such as Gray and Frank, that 
the same set of LSEFs influence TC formation regardless of the ocean basin. 
Though the model is likely better tuned if trained over the basin over which it will 
be used as a predictive tool, this comparison suggests that one basic model may 
be skillfully applied to multiple basins. 
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8. Model Shortcomings 

Two potential shortcomings were identified in the verification of the zero- 
lead hindcasts. Both of these shortcomings deal with the post-formation 
environment. First, the conditions that follow the formation day are likely to be 
represented by the model as remaining favorable for formation, despite a TC 
having already formed. The impacts of this shortcoming may be minimized by 
noting that the probabilities represent the favorability of the large-scale 
environment for TC formation, and if a TC forms the high probabilities may 
represent the likely track of the storm. Second, TCs may act to enhance or 
suppress the formation of other tropical cyclones (Frank 1982). Due to the 
coarse resolution of the CFS, it may poorly represent the TC-environment 
feedback. Further study would be required to assess the impacts of this second 
shortcoming, though such research ventures beyond the scope of this thesis. 

C. VARIATIONS OF THE REGRESSION MODEL 

The previous verification sections have tested a model containing terms 
for 850 mb relative vorticity, 850 mb relative vorticity squared, SST, vertical wind 
shear, Coriolis parameter, and 200 mb divergence, and trained over the peak 
formation period for the years 1982-2006. Through the course of this thesis 
research, numerous forms of the model were tested, in addition to this final 
model. For example, we varied the training period of the model, such as training 
the model over the entire year and over JASO, rather than just over the peak 
formation period. We also investigated the inclusion and/or combination of other 
variables as noted earlier in Section III.A. Using the suite of metrics and 
verification techniques listed in Chapter II, we selected the final model from 
among the many tested. For the sake of brevity, only verification for the final 
model has been presented in this thesis. 

D. FINDINGS FROM CFS CASE STUDIES 

The previous sections have explored the validity of the statistical model in 

identifying likely formation regions. The associated verification metrics represent 
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the potential skill, value, and applications as defined by the zero-lead hindcasts. 
In this section we demonstrate the ability to use the CFS as the source of LSEF 
values with which to force the regression model and generate forecast 
probabilities (see Figure 18). The availability and format of CFS output fields 
negates the use of many of the quantitative verification metrics that were 
possible with the reanalysis-based, zero-lead hindcasts. As a result, in order to 
investigate the predictive potential of the proposed technique, we will present a 
pair of case studies. The first case study is of a pair of storms from 2008 using 
operational CFS data; the data used for Case 1 is exactly what is readily 
available on a daily basis, and that could be used to operationalize the method 
proposed in this thesis. The second case study is one from 2003 using archived 
CFS hindcast fields. Plots from some additional case studies are included in 
Appendix C. 

1. Non-Zero Lead Hindcasts: Case 1 

TC activity in the 2008 TC season in the WNP was relatively low, for 
reasons that are not yet clear. From this low activity season, we examined two 
rather low intensity TCs. Our model should be robust enough to predict TCs in 
low activity seasons and TCs that do not reach high intensities. The only thing 
that may be notable about these two TCs, Mekkhala (20W) and Higos (21W), is 
that the JTWC has traced their origins back to the same day in 2008. 

Disturbances that would develop into Mekkhala and Higos were identified 
for as early as 27 September (see Figure 32 for formation points). Mekkhala, 
developing in the South China Sea, would grow to tropical depression strength 
by the following day, and be a named tropical storm another day later, on 29 
September. Similarly, Higos, forming in the WNP, would reach tropical 
depression strength, and then tropical storm strength on 29 September. 
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Figure 32. CFS ensemble mean probabilities from runs initiated on 13 
September 2008, valid 24-30 September 2008. Formation points (solid 
circles) and tracks (open circles) are included for Mekkhala (magenta) and 

Fligos (green). 


Figure 32 is a plot of the mean seven-day probabilities from the four- 
member ensemble. From this plot alone, it appears the CFS predicted the 
potential for above-average TC activity in the greater monsoon trough region at a 
two-week lead (tau: 336 hours). The difference plot in Figure 33 confirms that 
the CFS-based probabilities were higher at both formation points than what 
climatology would have provided. 
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Probability Difference: Mekkhala & Higos, 2-Week Lead 



Figure 33. A probability difference plot of the CFS ensemble mean 
probabilities (as in Figure 32) minus the climatological formation 
probabilities for the same period. Formation points are included for 
Mekkhala (magenta) and Fligos (green). 


63 

















a) 


Member 1 Probabilities: Mekkhala & Higos, 2-Week Lead 





d) 


Member 4 Probabilities: Mekkhala & Higos. 2-Week Lead 




Figure 34. Seven-day probabilities from each of the four ensemble members, 
initiated on 13 September 2008, valid 24-30 September 2008. Formation 
points are included for Mekkhala (magenta) and Higos (green). 


Figure 34 separates the ensemble mean plot in Figure 32 into individual 
ensemble members. Recall that the members are identical models, but have 
different initial conditions and/or initiation times (OOZ or 12Z). These minor 
variations between the members do result, as shown in Figure 34, in pronounced 
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differences after a two-week integration. A quick comparison of these seven-day 
probabilities reveals that no one member performed better than the others for 
both of these storms, although member four appeared to strongly predict the 
development of Higos. 

The contoured probability plots like those in Figure 34 represent summed 
daily probabilities. In addition to the spatial variability of the individual members, 
we could also analyze the temporal variations between the members. This 
additional degree of variability is not shown in this report, although the variations 
are what one would expect when comparing runs of any dynamical model— 
timing differences exist from run-to-run. The spatial and temporal variability 
between members highlights what was first mentioned in Section II.B.4, that the 
ensemble approach smoothes out differences between the runs, and highlights 
the more predictable elements of the climate system. Thus, this ensemble 
approach should lead to enhanced predictive skill overall, although there will, of 
course, be exceptions. 

As highlighted earlier, ensemble member four appeared to capture the 
formation of Higos. We explored the individual LSEFs that contributed to the 
probabilities plotted in panel d of Figure 34. Figure 35 displays those LSEFs for 
the day of formation, 27 September 2008. 
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9/27/D8; Shear (m/s): Mekkhala & Higos, 2-Week Lead 
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9/27/08; 200mb Div. (/s): Mekkhala & Higos. 2-Week Lead 


a) 9/27/08; 8S0mb Rel. Vor. (/s): Mekkhala & Higos, 2-Week Lead 


b) 
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Figure 35. Individual LSEFs from ensemble member four for the formation 

day, 27 September 2008. 
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The panels in Figure 35 depict the LSEFs for the formation day in the 
order of their statistical significance in the regression model; a) 850 mb relative 
vorticity, b) 850 mb relative vorticity squared, c) SST, d) vertical wind shear, and 
e) 200 mb divergence. The Coriolis term is not shown, as it is a simple function 
of latitude, and thus does not vary by member or run. The regression model, 
when applied to variables from member four, predicted the highest probabilities 
of formation for the week centered on the formation day to be near 10°N and 
140°E, very close to the actual formation location. Though the panels in Figure 
35 are for the formation day alone, they reveal why the high probabilities are 
predicted where they are. The region surrounding 10°N and 140°E is forecasted 
to experience high low-level relative vorticity, very warm SSTs, near a low shear 
zone, and positive upper-level divergence. 


9/27/08; 2CI0mb Wind (m/s): Mekkhala & Higos. 2-Week Lead 
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Figure 36. Winds at a) 200 mb and b) 850 mb from a two-week lead of 
ensemble member four valid for the formation day, 27 September 2008. 


The vorticity, vorticity squared, shear, and divergence terms included in 
the regression model are all calculated from the zonal and meridonal winds 
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available from the CFS. Figure 36 depicts the 200 mb and 850 mb winds from 
member four on formation day; the component winds used to create these full- 
wind plots are the same used to calculate the variables in Figure 35, except SST. 
Note the cyclonic circulation at 850 mb and anticyclonic outflow at 200 mb 
forecasted by member four for the formation day at a two-week lead. 

In this particular case study, a well-trained forecaster might have been 
able to use just the CFS output fields (as in Figure 36) to foresee the 
development of Fligos around 10°N and 140°E. Some readers may then 
question, why would one not just use the available CFS dynamical output to 
forecast tropical cyclogenesis? Many of the potential benefits of the combined 
statistical-dynamical approach have been noted implicitly elsewhere in this 
thesis. We feel that from the dynamical perspective, employing an ensemble 
minimizes the impacts of spatial and temporal errors within the model. If we 
analyzed member three, rather than member four, in the preceding figures, one 
would see that both the timing and strength of the circulation would have been 
inaccurate; therefore, a forecaster would have likely miss-forecasted the 
formation of Higos. A reason why operational numerical weather prediction is 
seldom used beyond ten days to two weeks is that longer leads are often beyond 
the limit of predictability of individual weather elements. Exploiting the expanded 
predictability of the large-scale circulations and ocean memory may extend the 
predictability of this combined method, vice the predictability of individual 
elements. Furthermore, the regression model represents a physically- and 
statistically-sound combination of LSEFs, which allows one to produce a reliable, 
repeatable prediction of TC formation. Rather than having to intuitively compare 
multiple output fields and subjectively generate a forecast, the contoured plots 
from the proposed method are easily generated and interpreted by forecasters or 
users. For these reasons, we feel that this combined statistical-dynamical 
method is a viable approach to intraseasonal prediction of tropical cyclogenesis. 
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Ens. Mean (contour) & Spread (fill): Mekkhala & Higos, 2-Week Lead 



Figure 37. Comparison of CFS-based TC formation probabilities in the form of 
an ensemble mean/spread plot (top) and OLR (bottom) for the same 
period. OLR image provided by Physical Sciences Division, Earth System 
Research Laboratory, NOAA, Boulder, Colorado, from their Web site at 
http://www.esrl.noaa.gov/psd/. 

As noted earlier, in addition to intraseasonal prediction of tropical 
cyclogenesis, this method appears to highlight regions of likely tropical deep 
convection. Figure 37 is a comparison of CFS-based forecast probabilities, in 
the form of an ensemble mean/spread plot, and OLR for the period of 24-30 
September 08. Whether verified against the formation of Mekkhala and Higos, 
difference from climatology, or against deep convection, the CFS-based 
probabilities from this case study show promise for this combined approach at a 
lead time of two weeks. 
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2. 


Non-Zero Lead Hindcasts: Case 2 


As a second case study into the predictive potential based on CFS, we 
focused on Ketsana and Parma (20W and 21W, respectively), two storms that 
formed on 18 October 2003. Rather than constructing a four-member ensemble 
from the operational CFS, we used the archived ensemble mean from the CFS 
hindcast project. This ensemble mean is an average of all 15 members 
initialized in one month from the CFS hindcast project. As a result, the initial 
conditions of the ensemble mean are staggered over the period of a month. Like 
other CFS runs, the integrations extended out to nine months. These ensemble 
mean runs are available once per month in the CFS archive, with the valid times 
beginning on the ninth day of every month. Thus we were able to work with a 
nine-day lead (tau: 144 hours) and a 39-day lead (tau: 864 hours) in this case 
study. 



Probability: R2 2003_291 


Probability: R1 2003_291 
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Figure 38. Contoured, seven-day summed probabilities, centered about 18 
August 2003, constructed from a) R2 and OISST and b) R1 and OISST 
fields. The red dots indicate the formation points for Ketsana (right) and 

Parma (left). 
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This second case study was chosen not for its perceived performance 
based on the CFS, but rather for its unusual reverse-oriented monsoon trough 
and high probabilities visible in the zero-lead hindcast (Figure 38). Figure 38 
displays the high probabilities that extend SW to NE over the WNP when using 
both the R1 and R2 reanalyses. The strong similarity between the R2-based 
(top) and R1-based (bottom) plots suggest that our model is not overly sensitive 
to the specific analysis and assimilation system. The logical question that follows 
is whether the 15-member CFS ensemble mean would predict this unusual 
activity. 


9-Day Lead Probabilities: Ketsana & Parma 




Figure 39. Contoured, seven-day probabilities, centered on 18 October 2003, 
constructed from the archived CFS ensemble mean at a nine-day lead. 
The red dots indicate the formation points for Ketsana and Parma. 


To assess the predictive potential, we first investigated the nine-day lead 
forecast. Figure 39 depicts the probabilities of TC formation for the period 15-21 
October 2003, based on archive CFS ensemble mean fields with a nine-day lead 
from the day of formation. The formation points for both Ketsana and Parma are 
included within the 0.5% minimum contour. 

Figure 40 is the same as Figure 39, but from fields with a 39-day lead from 
the day of formation. While the contours do suggest activity around 15°N, the 
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CFS-based probabilities at such a lead are notably different from the reanalysis- 
based, zero-lead probabilities (Figure 38) and do not indicate reverse monsoon 
trough conditions. 



Figure 40. Contoured, seven-day probabilities, centered on 18 October 2003, 
constructed from the archived CFS ensemble mean at a 39-day lead. The 
red dots indicate the formation points for Ketsana and Parma. 

Visual comparisons between Figure 38 and Figures 39 and 40 indicate 
differences between the CFS-based probabilities and the reanalysis-based 
probabilities in both magnitude and spatial distribution. As aforementioned, this 
case was chosen, in part, because of the high probabilities found in the zero-lead 
hindcast; both formation points were predicted with probabilities on the order of 
0.1 or a 10% probability. In contrast, the CFS-based probabilities at the 
formation points range from approximately 0.004 to 0.013. Also, the reanalysis- 
based probabilities depict favorable formation in a reverse-oriented monsoon 
trough pattern, while the CFS-based plots show a poleward extension of the 
contoured probabilities from the climatologically-favored monsoon trough region. 

One is left to wonder what accounts for the difference between the CFS- 
based and reanalysis-based probabilities. Is it a weakness of the regression 
model and/or of the CFS? Is something unique about this case that is causing 
these differences? 
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a) 9-Day Lead KETSPARM Average 850mb Wind 
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Figure 41. Comparison of 850 mb winds for the period 15-21 October 2003, 
from a) nine-day lead from the CFS ensemble mean and b) zero-lead R2 

data. Note the different scales. 
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Figure 42. Comparison of 200 mb winds for the period 15-21 October 2003, 
from a) nine-day lead from the CFS ensemble mean and b) zero-lead R2 

data. 
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Figures 41 and 42 are comparisons of the 850 mb winds and 200 mb 
winds, respectively, from averaged CFS ensemble mean output (at a nine-day 
lead) and averaged R2 data (at zero-lead) for the same period, 15-21 October 
2003. As discussed in the case study 1, the magnitude and distribution of the 
model probabilities are sensitive to these wind fields. From Figures 41-42, one 
can start to hypothesize why the probabilities are different when the regression 
model is forced with CFS and with R2 LSEF values. For example, the 850 mb 
winds (Figure 41) are similar in direction in most locations except the region 
extending from 125°E to 150°E and straddling 10°N. These robust westerlies 
indicated by the R2 data, at zero lead, have a profound impact on the reanalysis- 
based probabilities, in that they increase the vertical wind shear in that region 
and amplify low-level relative vorticity to the north. As a result, the region 125°E 
to 150°E and straddling 10°N is no longer favorable for TC formation, and 
enhances the probability of TC formation to the immediate north of the 
westerlies. These westerlies were not predicted by the CFS fields at a nine-day 
lead; therefore, the climatologically favored location for TC genesis is not 
displaced. The differences in the 200 mb winds are not as profound. Overall, it 
appears that temporally summing the bias-corrected ensemble mean fields tends 
to smooth the CFS fields such that they represent climatology. In the absence of 
any other predictable elements, seeing the CFS tend towards climatology is 
reassuring. This tendency is likely due in part to the bias correction we applied to 
the CFS output. 
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b) Probability Difference: Ketsana & Parma, 9-Day Lead 
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Figure 43. Comparison of a) CFS-based probabilities (repeat of Fig. 39), b) 
probability difference, and c) OLR, for the period 15-21 October 2003. 
The red dots indicate the formation points for Ketsana and Parma. 


From this case study, we observe that the 15-member CFS hindcast 
ensemble mean may be too much like climatology to yield formation probabilities 
that deviate greatly from climatology. Despite the differences between the R2 
and CFS-forecasted 850 mb winds, the probability difference plot in panel b) of 
Figure 43 highlights that the model still predicts probabilities higher than 
climatology in the region. In addition, a visual comparison between the CFS- 
based probabilities and the OLR plot for the same period. Figure 43 panels a) 
and c), suggests that this period may have been a convectively active period 
across much of the WNP, and that the CFS-based probabilities did a fair job in 
predicting this activity. 
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3. 


General Observations 


The earlier sections on the verification of the zero-lead hindcasts 
established a skillful benchmark for evaluating non-zero lead hindcasts and 
actual forecasts. The two non-zero lead hindcast case studies presented in the 
preceding section indicate that our combined statistical-dynamical method for 
intraseasonal prediction of regions favorable for tropical cyclogenesis has the 
potential to produce useful forecasts from the existing version of the CFS. 

Some of the differences between the CFS-based probabilities and the 
reanalysis-based probabilities are likely due to the differing mechanics of the two 
systems. Though the output we used was at 2.5° horizontal resolution for both 
systems, the effective portrayal of the assimilated observational data is different. 
The R2 assimilates data from a multitude of observational sources directly onto 
its Gaussian grid; therefore, it is conceivable that if a TC were forming or present 
over the WNP, the reanalysis data would represent the TC. While similar data is 
included into the CFS as initial conditions, as the model is integrated forward in 
time, the coarse-resolution numerics and physics mean that the smaller scale 
features in the LSEFs associated with TCs that are forming or present will in 
general be less well represented than in the R1 or R2 fields that force the zero- 
lead hindcasts. Thus, in general, the CFS is likely to predict LSEF magnitudes 
and gradients that are weaker than those in R1 and R2. 

One should recall that dynamical models, especially GCMs, though based 
on physical laws, are unable to resolve at all spatial and temporal scales and are 
sensitive to their often-problematic parameterizations. Nevertheless, it is 
important to remember that the CFS is not a simplified physics, coarse resolution 
atmospheric model. Indeed, it is a fully coupled, one-tier dynamical prediction 
system. With our proposed application, the coupling in the CFS is rather 
important. At short lead times, a forecast is mostly affected by atmospheric initial 
conditions. But at longer lead times, the ocean plays a greater role and can 
allow relatively high predictability in a time averaged forecasts. 
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We saw with the first case study that applying an ensemble approach to 
the operational CFS may increase the predictability by smoothing out differences 
between the members and enhancing the more predictable elements of the 
climate system. The second case suggested that it might be possible to over 
smooth, by using the archived ensemble mean summed over seven days. It was 
promising, however, that the CFS appears to trend towards a plausible, 
climatological state, rather than toward a model bias state. 

The first case study indicates that it may be possible to use raw output 
fields from the CFS to predict individual TC formations. For the aforementioned 
reasons, we believe that until the single-element predictability is increased in 
dynamical models, using the raw output at daily resolutions will often lead one 
astray at intraseasonal leads. By statistically combining several variables and 
summing temporally, the predictability is likely increased and more reflective of 
the large-scale environment that is known to impact TC development. 


77 



THIS PAGE INTENTIONALLY LEFT BLANK 


78 



IV. SUMMARY, CONCLUSIONS, AND RECOMMENDATIONS 


A. KEY RESULTS AND CONCLUSIONS 

This thesis is an exploration into the viability of employing a combined 
statistical-dynamical predictive method for forecasting TC formation probabilities 
at intraseasonal time scales. The primary focus of this work was to assess the 
feasibility of using such a method to predict favorable regions for tropical 
cyclogenesis. We also investigated whether this combined statistical-dynamical 
approach appears to result in skill and value beyond that which basic climatology 
provides. 

Our proposed predictive method involves forcing a statistical model with 
available output from a GCM. We began by investigating various atmospheric 
and oceanic variables in order to decide upon which LSEFs, or genesis 
parameters, to include as explanatory variables in our model. The chosen 
statistical model, summarized in Table 1, contains terms for 850 mb relative 
vorticity, 850 mb relative vorticity squared, SST, vertical wind shear, Coriolis 
parameter, and 200 mb divergence. Each of these variables was found to be 
necessary, both statistically and conceptually, but together may not be sufficient 
to forecast actual formation. Multivariate logistic regression was used to develop 
a statistical model for the probability of TC formation based on the favorability of 
the large-scale environment as defined by a linear combination of these LSEFs. 
As an aside, this work with the LSEFs also suggests that the variable thresholds, 
as defined by studies during the past several decades, should be made more 
restrictive. For example, the oft-cited criterion that SST in the WNP must be ^ 
26.5°C for TC formation may be increased to > 28°C (as suggested by Figure 9). 

The predictive potential of our method was first assessed by thorough 
quantitative and qualitative verification of reanalysis-based, zero-lead hindcasts. 
The model shows great potential, with a BSS of 0.0291 (0.0282...0.0299), a 
RCCSS of 0.683, reliable summed seven-day probabilities, and potential added 
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value for risk adverse customers. In addition, the zero-lead hindcasts performed 
well in dealing with climate oscillations, in developing conditional climatologies, in 
verification against deep convection, and even in quantitative verification in the 
Atlantic basin. 

The second assessment of the predictive potential of this technique came 
by way of by two CFS case studies, where we generated non-zero lead 
hindcasts for past TCs. The availability and format of CFS data confined much of 
the verification of these studies to be qualitative in nature. We explored an 
ensemble approach as a way to smooth out the spatial and temporal variability 
between members, and highlight the more predictable elements. Both the 
ensemble approach and the combination of LSEFs together lead to expanded 
predictability of the large-scale circulations, vice the limited predictability of 
individual elements. Results from these intraseasonal-lead case studies are 
promising, but also suggest much work remains when it comes to dynamical 
weather prediction on the intraseasonal scale. Purely dynamical intraseasonal 
forecasts are not overly skillful (van den Dool 2007), so our statistical-dynamical 
method appears to be a useful complement to existing alternatives for 
intraseasonal forecasting of TC formations. 

Overall, our method provides a stable, reliable, and repeatable approach 
to intraseasonal TC formation prediction that is applicable throughout the year 
and, apparently, in more than just the WNP basin. Our method allows 
forecasters to objectively and quantitatively merge information about all the 
LSEFs to produce an ensemble based, probabilistic forecast of the potential for 
TC formation and the favorability of the climate system compared to long term 
mean climatological probabilities. A single contoured plot, spanning a seven-day 
period is easy to interpret and may even be presented directly to users. A typical 
rule of thumb in forecasting is to use a numerical model only when you have 
confidence in its output. While we agree with that mantra, we are intrigued by 
the suggestion that the bias-corrected CFS fields tend toward climatology when 
the predictability in the climate system is low. If such is the case, this method 
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could be employed regularly and would, at the very least, depict a probabilistic 
representation of TC formation climatology. 

The concept of climatology appears throughout this thesis, both as the 
reference forecast against which the proposed technique was judged and as a 
potential tool in itself. Not all climatologies are created equal, however. See 
Appendix A for a brief discussion on the variations of climatology used in this 
work. In this thesis, the choice of climatology impacts the verification results. 

Plots of the difference in the probabilities generated by our method and 
those from climatology provide an intriguing presentation of the skill and value of 
our method. Such plots can be viewed as probability anomalies and clearly 
reveal where our method predicts higher and lower likelihood of formation than 
climatology. Operationally, a forecast for no (or less-likely) activity may be just 
as beneficial as a forecast for highly-probable formation. For example, an 
extended area of probabilities lower than climatology may suggest safer passage 
for a carrier strike group wishing to transit the region. 

Using the data and methods outlined in Chapter II, we believe that the 
model, as described and verified in Chapter III, presents a viable approach to 
intraseasonal prediction of tropical cyclogenesis. The numerous preceding 
pages were presented not as a testament to amount of code written or number of 
variations tested in this research, but rather as an explanation and validation of 
this combined statistical-dynamical approach in intraseasonal TC prediction. 

B. APPLICABILITY TO DOD OPERATIONS 

C’Lenic et al. (2007), in discussing recent developments in operational 
long-range climate prediction at CPC, state “improvements in the science and 
production methods of LRFs [long-range forecasts] are increasingly being driven 
by users, who are finding an increasing number of applications, and demanding 
improved access to forecast information.” While this is encouraging and may be 
true in the civilian sector, we are of the opinion that the preponderance of DoD 
customers do not know of what Air Force Weather (AFW) and Navy METCC 
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communities are truly capable. As the products and procedures of these two 
communities are driven by requirements, if customers do not require a product, it 
will likely go uninvestigated. 

The majority of day-to-day military scheduling and planning is focused on 
operations and exercises that will occur weeks or months later. Translating the 
weeks to months of lead times of the planning realm into meteorological terms, 
we draw a parallel between the time scale of military planning and intraseasonal 
forecasts. In contrast, the preponderance of weather support provided by the 
AFW and Navy METOC communities is focused on short-range forecasting (lead 
times of 72 hours or less) or nowcasting (lead times less than three hours). This 
indicates that weather support is out of synch with the majority of the planning 
done by its military customers. 

Arguably, the planning phase is when weather support may have the 
greatest positive impact on military operations, by alerting planners to the 
potential conditions that may impact their operations, while the planners still have 
time to mitigate the impacts of some environmental conditions and exploit the 
opportunities provided by other environmental conditions. For planners of many 
military operations, short-range forecasts come too late in the process to have 
much influence on the planning. In many of these cases, skillful long-range 
forecasts (e.g., lead times of two week or longer) could be very useful in 
determining where and when to conduct an operation, what assets and tactics to 
employ, etc. (personal communication CDR Van Gurley 2005; CDR Tony Miller 
2009). 

Due to a lack of freely available forecast products at the intraseasonal 
scale, even an accessible, understandable depiction of climatology or of a 
conditional climatology has potential value for military planners. The DoD lacks 
many such a products. Previous theses (e.g., Tournay 2008; Moss 2007) and 
sections from this report highlight the power of state-of-the-science climatology, 
or “smart” climatology. Creating state-of-the-science climatologies—using the 
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latest data sets, knowledge of climate oscillations, etc.—offers a significant 
improvement in environmental intelligence for DoD planners. 

Active intraseasonal prediction has the potential to add value beyond 
climatology. By exploiting the predictability within the climate system, via 
statistical, dynamical, or combined methods, skillful weather information may be 
provided to military planners and operators. It is important for military centers to 
undertake such prediction in addition to civilian centers, as the military is often 
focused on regions and variables not covered by civilian products. For example, 
civilian forecasting centers generally focus on TC landfall locations or the number 
of TCs in a season. While TC landfall and seasonal counts are important, for the 
military, information at much greater temporal and spatial resolution, and over the 
open ocean, would likely prove beneficial. For example. Navy and Air Force 
planners would benefit from insight into periods and regions safe for ship and 
aviation operations. The technique proposed in this thesis has other benefits as 
well. Among these benefits is that an operational version of this process could 
be a fully-automated process that could be delivered to forecasters and 
customers in multiple formats, to include those via geographic information 
systems. 

As evidenced by the demands placed on civilian forecast centers from 
customers, one is led to conclude that if DoD planners and operators saw the 
potential value-added from heeding long-range weather intelligence, they too 
would demand more of it. Products stemming from intraseasonal predictions 
need not be starkly different from short-term forecasts to which customers are 
accustomed. For example, the ship avoidance chart from JTWC (as in Figure 
44) is routinely presented to operators for decision-making. Potential 
deliverables from the method proposed in this thesis could be very similar to 
such ship avoidance charts. In fact, the similarity of products would aid in 
fostering seamless weather support for planning to mission execution from the 
users’ perspectives. 
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Figure 44. Example JTWC ship avoidance chart (From 
http.7/metocph.nmci.navy.mil/jtwc/legend/ship_key.html; accessed 27 

February 2009). 


Whether the mission is a trans-oceanic air bridge, carrier strike group flight 
qualification training, or a major multi-national naval exercise, no current DoD 
products exist, beyond antiquated climatology products, to aid mission planners 
is assessing the likely state of climate system weeks to months in advance. The 
method proposed in this thesis, and others like it, could add value for numerous 
customers, and certainly has the potential for saving units’ time and tax dollars. 
This thesis represents a test of this concept. We propose that this and similar 
products be presented to customers to see what applications and demands 
emerge throughout the DoD. 
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C. AREAS FOR FURTHER RESEARCH 

The previous sections have shown that this approach demonstrates 
intriguing potential, and that ample room remains for further research and 
exploration. 

1. Technique Exploration 

As this was a proof of concept for the technique, further exploration into 
the mechanics of the approach seems prudent. The order of the following ideas 
for future research does not represent priority. 

1) Vary the regression model based on end strength and/or growth 
rate of the included storms. Preliminary work confirms the common thought that 
not all TCs form and behave in the same manner. The method used in this 
thesis was founded on the idea that compositing numerous storms smooths out 
the differences and enhances the features in common. However, could one 
construct a more skillful model if end strength and/or growth rate were taken into 
consideration? 

2) As mentioned in Section III.B.8, some of the apparent shortcomings 
of this model deal with the post-formation environment. We were able to mitigate 
these shortcomings by adjusting the NTCI, filtering out data according to MSLP 
from the model construction process, and including the relative vorticity squared 
term. In order to better highlight the conditions at formation, one should 
uniformly define the formation day in the best track archive and consider 
constructing a regression model excluding data surrounding the track post¬ 
formation. 

3) Future research should investigate further the best method for 
including NTCI in the development of the regression model. This research 
should attempt to answer questions such as: To what extent should NTCI from 
regions or periods in which TCs have never formed be used to train the model? 
Should all NTCI come just from locations and months in which TCs have been 
observed to occur? 
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4) From the available reanalyses and CFS fields, we calculated 
several of our LSEFs using second order centered finite differencing (see 
Appendix B). One may consider using a more advanced method to calculate 
variables, such as Legendre polynomials for meridional differentiation and 
Fourier analysis for zonal spatial differentiation. 

5) As we observed with the CFS case study, a delicate balance 
appears to exist between predictability and resolution (as in the model’s 
difference from climatology). The construct of the current operational CFS allows 
one to readily create a four-member ensemble. While keeping the balance issue 
in mind, one may explore the idea of creating an expanded ensemble by using 
runs initialized on multiple days. Such an approach would more closely resemble 
the approach CPC takes in using the CFS in seasonal forecasting. 

6) A struggle throughout this thesis process concerned the issue of 
how best to verify the propensity for TC formation. Cther centers with similar 
spatial forecasts of rare events seem to struggle as well, and no industry 
standard exists for the verification of such products. The approach we took uses 
an assemblage of tools, most of which inevitably verify the propensity for 
formation against actual formations. The issue of verification needs to be 
explored further. Could we numerically score against OLR or some other 
variable that represents favorable LSEFs? 

7) While numerous combinations of possible LSEFs were tested for 
inclusion into the regression equation in this research, additional work could be 
accomplished in this area. Ideal candidates are oceanic variables, such as 
mixed layer depth. In addition, one may consider additional non-linear 
relationships between variables and TC formation or between separate variables. 
For example, we experienced an improvement in our model’s performance by the 
addition of the vorticity-squared term. 

8) Prior work by Meyer (2007) and others indicate that the same 
LSEFs that influence formation may also influence the intensity of a TC. 
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Subsequent research may investigate the potential for generating near-term 
estimates for the intensity of a storm that has formed, or may soon form, based 
on the predicted conditions of the large-scale environment. 

2. Data Exploitation 

As mentioned in Section II.B. regarding the importance of the reanalysis 
data sets, the logistic regression approach we employed would not have been 
possible if the atmospheric and SST reanalysis datasets were not available. 
Similarly, this approach would not have been viable without the existence of the 
CFS data set, including an extensive hindcast archive. Current and forthcoming 
data sources offer potential avenues through which to improve the combined 
statistical-dynamical method proposed in this thesis. 

1) As noted throughout this thesis, the model was trained on 
reanalysis data and applied in proof-of-concept testing using CFS data. Though 
it would require a substantial storage and coding investment initially, one should 
consider using the CFS to both train and test such a model. In addition to 
accounting for the subtle biases and nuances within the model, this approach 
would allow for the testing of more variables—especially oceanic variables— 
thought to impact TC formation. It was not so much the storage or coding that 
pushed us away from this approach for this thesis, but rather the limited days for 
which hindcast data is available. Would enough storms be captured by a purely- 
CFS approach to successfully train and test a regression model? In addition, we 
felt it was important to first use reanalysis values of the LSEFs in building the 
regression model, so that a relatively skillful benchmark based on zero-lead 
hindcasting could be established. But future studies could consider building a 
regression model based solely on forecasted LSEFs. 

2) Short of using the CFS data, one may consider employing an 
ocean reanalysis, or the forthcoming coupled reanalysis from NCEP, to 
investigate the use of oceanic LSEFs other than SST. We hypothesize that a 
term representing mixed layer depth may be a more skillful predictor than SST. 
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In addition, such oceanic variables are known to better represent long term 
climate system memory. Thus, the use of better or additional oceanic LSEFs 
than SST could provide a better match between the model’s terms and the 
climate system variables that best represent intraseasonal predictability. 

3) Low skill in CFS intraseasonal predictions of atmospheric variables 
is likely a weak point for our technique. While there is no reason to believe that 
the current CFS is inferior at such leads compared to other GCMs, one may find 
it worthwhile to explore other GCMs, such as those from the Goddard Space 
Flight Center or Australian Bureau of Meteorology. Though more 
computationally demanding, the most intriguing approach may be to employ a 
multi-model ensemble approach to generate the necessary LSEF fields. 

4) Future plans for the CFS include an operational T126 version. 
Though we feel that LSEFs must occur over an adequate spatial and temporal 
scale to affect TC formation, a higher resolution model may generate higher 
magnitudes and gradients, and more skillful predictions of the LSEFs. 
Experimental runs by CPC of a high-resolution T254 and T382 CFS have shown 
that it has the potential to predict individual TCs and may have skill in 
characterizing overall TC activity (Schemm et al. 2008). Undeniably, a 
comparison between a high-resolution CFS, or comparable system (e.g., from 
ECMWF), and a lower-resolution combined approach as proposed in this thesis 
would be worthwhile. 
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APPENDIX A. VARIATIONS OF CLIMATOLOGY 


Climatology is used throughout this thesis as a baseline against which we 
compare our statistical-dynamical prediction method. Not all climatologies are 
created equal, however. The following paragraphs highlight the forms of 
climatology applied in or mentioned in this work, all of which are legitimate, but 
distinct, forms of climatology. 

The most basic form of climatology is sample climatology. As used in this 
thesis, the sample climatology is the average rate of occurrence based on the 
verification dataset. For example if a TC hit is observed 10 times out of 1,000 
possible day grid points, the sample climatology would be 10/1,000 or 0.01. This 
form of climatology is used in quantitative verification such as the BSS. 

We also use various forms of raw climatology based on the JTWC best 
track data. Figure 6 in Chapter II is an example plot of raw climatology. This 
plotted data was created by treating each of the 2.5° x 2.5° grid blocks in the 
WNP as individual bins. Looping through a set period of time (e.g., 1970 to 
2007), we counted the number of formations that occur in each bin, then divided 
the number in each bin by the length of time for the given time interval. Based on 
the time interval one chooses, the output values vary numerically—as daily, 
weekly, monthly, etc. probabilities—but the spatial distribution does not. A 
shortcoming of this raw spatial climatology is its lack of day-to-day variation, in 
that the magnitude and distribution of daily probabilities for 27 March are the 
same as 26 August, which we know is not typically the case in the real climate 
system. 

A more robust version of climatology, still based on the JTWC data, is one 
that varies in magnitude throughout the year. This form of climatology was 
created by taking a 28-day, Loess-smoothed form of the daily observed TC 
formations for the WNP, dividing by the number of days in the period to give us a 
daily probability that a TC will form somewhere in the WNP on a given day. 


89 



These daily probabilities were multiplied by a normalized spatial distribution of 
the likelihood of TC formation in the WNP. The result is a climatology that 
displays an annual cycle and spatial variation in the output probabilities. This is 
the form of climatology used in creating the difference plots depicted in Chapter 
III of this work. 

Figure 45 is an example of the components involved in generating such a 
form of climatology: a) a smoothed version of daily formation counts, b) a 
normalized distribution of spatial climatology, and c) an example of the resulting 
daily probabilities for 1 August. This form of climatology vaguely resembles the 
approach taken by Leroy and Wheeler (2008), who generated a climatological 
seasonal cycle based on raw probabilities smoothed through harmonic analysis. 
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a) 


Loess (quadratic fit) 28-Point Smoothing of Formation Count 



b) 


Normalized Spatial Distribution of Raw Climatology 
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Figure 45. Components used to create a more robust climatology against 
which to compare our method; a) a smoothed form of daily formation 
counts, b) a normalized distribution of spatial climatology, and c) an 
example of the resulting daily climatology for 1 August. 


These preceding forms of climatology are all based on the observational 
JTWC best track data. An approach to generating a pseudo-climatology is 
mentioned in Section III.B.5. Rather than generating probabilities based on the 
number of TCs observed for a given spatial and temporal scale, this approach 
uses the regression model outlined in Table 1 to generate a probability of TC at 
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every grid point based on LTM LSEFs. A notable benefit of this approach is that 
it is not sensitive to the number of TC formations. For example, if one wants to 
create climatology for the probability of TC formation for a forthcoming exercise 
in the month of May, a method relaying on raw JTWC data would depict patchy 
probabilities due to the limited number of storms (e.g., 53 in the month of May for 
the years 1970 to 2006). The spottiness of the output would not accurately 
reflect the large-scale environment, but rather roughly contour the individual 
storm formation points. In contrast, the LTM LSEFs (from one of the NCEP 
reanalyses) when processed by our regression model would result in a depiction 
of climatology much more indicative of the favorability of the typical climate 
system in the month of May. 

As noted, each of the preceding forms of climatology is a different, but 
legitimate, approach to representing climatology. Climatology is both a useful 
tool and a baseline reference forecast. In a situation where no pronounced 
predictable elements appear in the climate system, a state-of-the-science 
climatology may be the best intraseasonal/seasonal outlook one has to offer. 
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APPENDIX B. CALCULATION OF VARIABLES 


As only a limited number of variables are available at daily timesteps from 
the CFS, we had to calculate additional variables based on available model 
output fields. In this work, we employed second order centered finite differencing 
for variables requiring spatial derivatives. 

For example, a variable directly representing vertical motion is not readily 
available from the CFS. We surmise that some degree of uplift would exist 
(especially in and around the monsoon trough) if low-level convergence and/or 
upper-level divergence exist. As such, we opted to—among other variables— 
derive 200 mb divergence based on available 200 mb zonal and meridional wind 
fields. 


Take equation 2.21 from Carlson (1998), where horizontal divergence on a 
fixed pressure level is given by: 


v^.y = 


^ 5 m dv^ 
dx dy j 


V" yp 

Holding the area constant, to represent the fixed model grid spacing, the 
horizontal divergence in second order centered finite difference form of 
divergence at 200 mb (£> 200 ) becomes: 


^200 


-K-i 


/200 


2Ax 2Ay 

Where £> 200 '® ^^e horizontal divergence at 200 mb, U is the zonal wind, V 
is the meridional wind. Ax is the zonal (east-west) grid spacing, and Ay is the 
meridional (north-south) grid spacing. Also, j and i are the longitudinal and 
latitudinal indexes, respectively. 

Then converting the above equation into MATLAB syntax, the equation for 
200 mb divergence for an array of size (41,144,365) becomes: 
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for i = 2:40 
forj = 2:143 
for k = 1:365 

dy = 111319.49*2.5; % Spacing in meters, based on WGS-84 

dx(i) = cosd(i)*111319.49*2.5; 
dudx(i,j,k) = (U_200(i,j+1,k)- U_200(i,j-1,k))/(2*dx(i)); 
dvdy(i,j,k) = (V_200(i-1,j,k) - V_200(i+1,j,k))/(2*dy); 
end 
end 
end 

DIV_200 = dudx+dvdy; % Divergence at 200mb; s'^ 

Note that MATLAB indexes top to bottom, thus requiring an opposite 
convention on the latitudinal index. Also, U_200 and V_200 are predefined 
variables representing three-dimensional arrays of the 200mb zonal and 
meridional winds, respectively. 

With this spatial finite differencing, we could just as easily used fourth 
order finite differencing methods. With the model output variables from which 
such additional variables are calculated being at 2.5° horizontal resolution, we 
felt that fourth order methods would overly smooth the gradients. Figure 46 is a 
comparison of second order versus fourth order finite differencing for 200 mb 
divergence for a sample day in 1991. 
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Figure 46. Comparison of a) second order finite differencing and b) fourth 
order finite differencing for 200 mb divergence on 8 August 1991, 
constructed from R2 wind fields. Panel c) is the difference between a) and 
b). Note the different scales between the divergence and difference plots. 


As noted in Section IV.C.1.3)., the calculation of additional variables from 
the available model output fields is an area open to further research. While the 
second order centered finite differencing allows us to readily calculate several 
variables that are based on spatial derivatives, the five-degree “reach” about 
each grid point does result in some gradient loss versus what we might get if 
such variables were directly predicted by the CFS. 
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APPENDIX C. ADDITIONAL CASE STUDIES 


Operational CFS Cases 

As an additional resource for the reader, this section includes additional 
plots for storms that occurred in the WNP during the fall of 2008. The probability 
plots that follow are based on the operational CFS, and thus are generated from 
the four-member ensemble. The construct of these cases mirrors Case 1 in 
Section III.D.1. 

The genesis of Jangmi (19W) may be traced back to 24 September 2008. 
Due to the limited availability of daily operational CFS fields, the lead time for this 
case is limited to a four-day lead. Figure 47 depicts the seven-day summed 
probabilities at a four-day lead and a comparison composite OLR plot for the day 
seven-day period. 


a) Ensemble Average 7-Day Probability: JANGMI ^Day Lead 
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Figure 47. Comparison of a) CFS-based TC formation probabilities from the 
ensemble mean at a 4-day lead and b) OLR for the period of 21-27 
September 2008. The formation point and storm track is marked by the 
green dot and magenta circles, respectively. 
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Maysak (24W) was a weak storm whose origins may be traced back to 5 
November 2008. Figure 48 depicts the seven-day summed probabilities at a 
two-week lead, at a three-week lead, and a comparison composite OLR plot for 
the day seven-day period. 





Figure 48. Comparison of a) CFS-based TC formation probabilities from the 
ensemble mean at a 2-week lead, b) at a 3-week lead, and c) OLR for the 
period of 2-8 November 2008 The formation point and storm track is 
marked by the green dot and magenta circles, respectively. 


As a late season storm with unusual formation dynamics. Dolphin (27W) 

makes an interesting case study. JTWC notes the beginnings of Dolphin as early 

as 8 December 2008. Figure 49 displays the seven-day summed probabilities at 
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a two-week lead, at a three-week lead, and a comparison composite OLR plot for 
the day seven-day period about which the probability plots are centered. 
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Figure 49. Comparison of a) CFS-based TC formation probabilities from the 
ensemble mean at a 2-week lead, b) at a 3-week lead, and c) OLR for the 
period of 5-11 December 2008 The formation point and storm track is 
marked by the green dot and magenta circles, respectively. 


Hindcast CFS Cases 

In contrast to the above cases that were based on daily, operational CFS 
output, the cases in this section are based on archived hindcast CFS data. 
Though archived data is used, the lead times are still true-to-form, thus the 
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probability plots are contours of probabilities based on forecast variable fields. 
The data used for generating these case studies mirrors the 15-member 
ensemble mean data used for Case 2 in Section III.D.2. 

Jelawat (13W) formed on 31 July 2000, in a location well removed from 
the climatologically favored formation regions. Figure 50 provides a visual 
comparison between the CFS-based probabilities and OLR over the same 
seven-day period. 



Figure 50. Comparison of a) CFS-based TC formation probabilities from the 
ensemble mean at a 22-day lead and b) OLR for the period of 28 July - 3 
August 2000. The formation point for Jelawat is highlighted by the 

magenta dot. 


JTWC lists the formation day for Krosa (24W) as 3 October 2001. Figure 
51 offers a visual comparison between the seven-day summed CFS-based 
probabilities centered on 3 October 2001, based on the 15-member ensemble 
mean, and OLR over the same seven-day period. 
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Figure 51. Comparison of a) CFS-based TC formation probabilities from the 
ensemble mean at a 24-day lead and b) OLR for the period of 30 
September - 6 October 2001. The formation point for Krosa is marked by 

the magenta dot. 


As a final case study, Mindulle (10W) formed on 21 September 2004. The 
panels in Figure 52 represent a) the CFS-based probabilities from a 12-day lead, 
b) the CFS-based probabilities from a 43-day lead, and c) the NOAA interpolated 
OLR image from the same period. The OLR images displayed in this appendix 
and throughout this thesis are courtesy of the Physical Sciences Division, Earth 
System Research Laboratory, NOAA, Boulder, Colorado, from their Web site at 
http://www.esrl.noaa.gov/psd/. 


101 

























WA>;^277.631 OrADS image 

MIM- ’^2.5 


3 L 


J L 


Figure 52. Comparison of a) CFS-based TC formation probabilities from the 
ensemble mean at a 12-day lead, b) a 43-day lead, and c) OLR for the 
period of 18-24 June 2004. The formation point for Mindulle is marked by 

the magenta dot. 
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